Nuclease-mediated regulation of gene expression

ABSTRACT

The present disclosure is in the field of genome engineering, particularly targeted modification of the genome of a hematopoietic cell.

CROSS-REFERENCE TO RELATED APPLICATIONS

The present application claims the benefit of U.S. ProvisionalApplication No. 61/903,823, filed Nov. 13, 2013 and U.S. ProvisionalApplication No. 62/042,075, filed Aug. 26, 2014, the disclosures ofwhich are hereby incorporated by reference in their entireties.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted electronically in ASCII format and is hereby incorporated byreference in its entirety. Said ASCII copy, created on Apr. 23, 2021, isnamed 717317_SA9-601DIV_ST25.txt, and is 58,688 bytes in size.

TECHNICAL FIELD

The present disclosure is in the field of genome engineering,particularly targeted modification of the genome of a hematopoieticcell.

BACKGROUND

When one considers that genome sequencing efforts have revealed that thehuman genome contains between 20,000 and 25,000 genes, but fewer than2000 transcriptional regulators, it becomes clear that a number offactors must interact to control gene expression in all its varioustemporal, developmental and tissue specific manifestations. Expressionof genes is controlled by a highly complex mixture of general andspecific transcriptional regulators and expression can also becontrolled by cis-acting DNA elements. These DNA elements comprise bothlocal DNA elements such as the core promoter and its associatedtranscription factor binding sites as well as distal elements such asenhancers, silencers, insulators and locus control regions (LCRs) (seeMaston et al (2006) Ann Rev Genome Hum Genet 7: 29-50).

Enhancer elements were first identified in the SV40 viral genome, andthen found in the human immunoglobulin heavy chain locus. Now known toplay regulatory roles in the expression of many genes, enhancers appearto mainly influence temporal and spatial patterns of gene expression. Ithas also been found that enhancers function in a manner that is notdependent upon distance from the core promoter of a gene, and is notdependent on any specific sequence orientation with respect to thepromoter. Enhancers can be located several hundred kilobases upstream ordownstream of a core promoter region, where they can be located in anintron sequence, or even beyond the 3′ end of a gene.

Various methods and compositions for targeted cleavage of genomic DNAhave been described. Such targeted cleavage events can be used, forexample, to induce targeted mutagenesis, induce targeted deletions ofcellular DNA sequences, and facilitate targeted recombination at apredetermined chromosomal locus. See, e.g., U.S. Pat. Nos. 8,623,618;8,034,598; 8,586,526; 6,534,261; 6,599,692; 6,503,717; 6,689,558;7,067,317; 7,262,054; 7,888,121; 7,972,854; 7,914,796; 7,951,925;8,110,379; 8,409,861; U.S. Patent Publications 20030232410; 20050208489;20050026157; 20060063231; 20080159996; 201000218264; 20120017290;20110265198; 20130137104; 20130122591; 20130177983, 20130177960 and20150056705, the disclosures of which are incorporated by reference intheir entireties for all purposes. These methods often involve the useof engineered cleavage systems to induce a double strand break (DSB) ora nick in a target DNA sequence such that repair of the break by anerror born process such as non-homologous end joining (NHEJ) or repairusing a repair template (homology directed repair or HDR) can result inthe knock out of a gene or the insertion of a sequence of interest(targeted integration). This technique can also be used to introducesite specific changes in the genome sequence through use of a donoroligonucleotide, including the introduction of specific deletions ofgenomic regions, or of specific point mutations or localized alterations(also known as gene correction). Cleavage can occur through the use ofspecific nucleases such as engineered zinc finger nucleases (ZFN),transcription-activator like effector nucleases (TALENs), or using theCRISPR/Cas system with an engineered crRNA/tracr RNA (‘single guideRNA’) to guide specific cleavage. Further, targeted nucleases are beingdeveloped based on the Argonaute system (e.g., from T. thermophilus,known as ‘TtAgo’, see Swarts et al (2014) Nature 507(7491): 258-261),which also may have the potential for uses in genome editing and genetherapy.

Red blood cells (RBCs), or erythrocytes, are the major cellularcomponent of blood. In fact, RBCs account for one quarter of the cellsin a human. Mature RBCs lack a nucleus and many other organelles inhumans, and are full of hemoglobin, a metalloprotein that functions tocarry oxygen to the tissues as well as carry carbon dioxide out of thetissues and back to the lungs for removal. This protein makes upapproximately 97% of the dry weight of RBCs and it increases the oxygencarrying ability of blood by about seventy fold. Hemoglobin is aheterotetramer comprising two alpha (α)-like globin chains and two beta(β)-like globin chains and 4 heme groups. In adults the α2β2 tetramer isreferred to as Hemoglobin A (HbA) or adult hemoglobin. Typically, thealpha and beta globin chains are synthesized in an approximate 1:1 ratioand this ratio seems to be critical in terms of hemoglobin and RBCstabilization. In a developing fetus, a different form of hemoglobin,fetal hemoglobin (HbF), is produced which has a higher binding affinityfor oxygen than Hemoglobin A such that oxygen can be delivered to thebaby's system via the mother's blood stream. Fetal hemoglobin alsocontains two α globin chains, but in place of the adult β-globin chains,it has two fetal gamma (γ)-globin chains (i.e., fetal hemoglobin isα2γ2). At approximately 30 weeks of gestation, the synthesis of gammaglobin in the fetus starts to drop while the production of beta globinincreases. By approximately 10 months of age, the newborn's hemoglobinis nearly all α2β2 although some HbF persists into adulthood(approximately 1-3% of total hemoglobin). The regulation of the switchfrom production of gamma- to beta-globin is quite complex, and primarilyinvolves a down-regulation of gamma globin transcription with asimultaneous up-regulation of beta globin transcription.

Genetic defects in the sequences encoding the hemoglobin chains can beresponsible for a number of diseases known as hemoglobinopathies,including sickle cell anemia and thalassemias. In the majority ofpatients with hemoglobinopathies, the genes encoding gamma globin remainpresent, but expression is relatively low due to normal gene repressionoccurring around parturition as described above.

It is estimated that 1 in 5000 people in the U.S. have sickle celldisease (SCD), mostly in people of sub-Saharan Africa descent. Thereappears to be a benefit for heterozygous carriers of the sickle cellmutation for protection against malaria, so this trait may have beenpositively selected over time, such that it is estimated that insub-Saharan Africa, one third of the population has the sickle celltrait. Sickle cell disease is caused by a mutation in the β globin geneas a consequence of which valine is substituted for glutamic acid atamino acid #6 (a GAG to GTG at the DNA level), where the resultanthemoglobin is referred to as “hemoglobinS” or “HbS.” Under lower oxygenconditions, a conformational shift in the deoxy form of HbS exposes ahydrophobic patch on the protein between the E and F helices. Thehydrophobic residues of the valine at position 6 of the beta chain inhemoglobin are able to associate with the hydrophobic patch, causing HbSmolecules to aggregate and form fibrous precipitates. These aggregatesin turn cause the abnormality or ‘sickling’ of the RBCs, resulting in aloss of flexibility of the cells. The sickling RBCs are no longer ableto squeeze into the capillary beds and can result in vaso-occlusivecrisis in sickle cell patients. In addition, sickled RBCs are morefragile than normal RBCs, and tend towards hemolysis, eventually leadingto anemia in the patient.

Treatment and management of sickle cell patients is a life-longproposition involving antibiotic treatment, pain management andtransfusions during acute episodes. One approach is the use ofhydroxyurea, which exerts its effects in part by increasing theproduction of gamma globin. Long term side effects of chronichydroxyurea therapy are still unknown, however, and treatment givesunwanted side effects and can have variable efficacy from patient topatient. Despite an increase in the efficacy of sickle cell treatments,the life expectancy of patients is still only in the mid to late 50'sand the associated morbidities of the disease have a profound impact ona patient's quality of life.

Thalassemias are also diseases relating to hemoglobin and typicallyinvolve a reduced expression of globin chains. This can occur throughmutations in the regulatory regions of the genes or from a mutation in aglobin coding sequence that results in reduced expression or reducedlevels or functional globin protein. Alpha thalassemias are mainlyassociated with people of Western Africa and South Asian descent, andmay confer malarial resistance. Beta thalassemia is mainly associatedwith people of Mediterranean descent, typically from Greece and thecoastal areas of Turkey and Italy. Treatment of thalassemias usuallyinvolves blood transfusions and iron chelation therapy. Bone marrowtransplants are also being used for treatment of people with severethalassemias if an appropriate donor can be identified, but thisprocedure can have significant risks.

One approach that has been proposed for the treatment of both SCD andbeta thalassemias is to increase the expression of gamma globin with theaim to have HbF functionally replace the aberrant adult hemoglobin. Asmentioned above, treatment of SCD patients with hydroxyurea is thoughtto be successful in part due to its effect on increasing gamma globinexpression. The first group of compounds discovered to affect gammaglobin reactivation activity were cytotoxic drugs. The ability to causede novo synthesis of gamma-globin by pharmacological manipulation wasfirst shown using 5-azacytidine in experimental animals (DeSimone (1982)Proc Nat'l Acad Sci USA 79(14):4428-31). Subsequent studies confirmedthe ability of 5-azacytidine to increase HbF in patients withβ-thalassemia and sickle cell disease (Ley, et al., (1982) N. Engl. J.Medicine, 307: 1469-1475, and Ley, et al., (1983) Blood 62: 370-380). Inaddition, short chain fatty acids (e.g. butyrate and derivatives) havebeen shown in experimental systems to increase HbF (Constantoulakis etal., (1988) Blood 72(6):1961-1967). Also, there is a segment of thehuman population with a condition known as ‘Hereditary Persistence ofFetal Hemoglobin’ (HPFH) where elevated amounts of HbF persist inadulthood (10-40% in HPFH heterozygotes (see Thein et al (2009) Hum.Mol. Genet 18 (R2): R216-R223). This is a rare condition, but in theabsence of any associated beta globin abnormalities, is not associatedwith any significant clinical manifestations, even when 100% of theindividual's hemoglobin is HbF. When individuals that have a betathalassemia also have co-incident HPFH, the expression of HbF can lessenthe severity of the disease. Further, the severity of the natural courseof sickle cell disease can vary significantly from patient to patient,and this variability, in part, can be traced to the fact that someindividuals with milder disease express higher levels of HbF.

One approach to increase the expression of HbF involves identificationof genes whose products play a role in the regulation of gamma globinexpression. One such gene is BCL11A, first identified because of itsrole in lymphocyte development. BCL11A encodes a zinc finger proteinthat is thought to be involved in the developmental stage-specificregulation of gamma globin expression. BCL11A is expressed in adulterythroid precursor cells and down-regulation of its expression leads toan increase in gamma globin expression. In addition, it appears that thesplicing of the BCL11A mRNA is developmentally regulated. In embryoniccells, it appears that the shorter BCL11A mRNA variants, known asBCL11A-S and BCL11A-XS are primary expressed, while in adult cells, thelonger BCL11A-L and BCL11A-XL mRNA variants are predominantly expressed.See, Sankaran et al (2008) Science 322 p. 1839. The BCL11A proteinappears to interact with the beta globin locus to alter its conformationand thus its expression at different developmental stages. Use of aninhibitory RNA targeted to the BCL11A gene has been proposed (see, e.g.,U.S. Patent Publication 20110182867) but this technology has severalpotential drawbacks, namely that complete knock down may not beachieved, delivery of such RNAs may be problematic and the RNAs must bepresent continuously, requiring multiple treatments for life.

Targeting of BCL11A enhancer sequences may provide a mechanism forincreasing HbF. For example, genome wide association studies haveidentified a set of genetic variations at BCL11A that are associatedwith increased HbF levels. These variations are a collection of SNPsfound in non-coding regions of BCL11A that function as a stage-specific,lineage-restricted enhancer region. Further investigation revealed thatthis BCL11A enhancer is required in erythroid cells for BCL11Aexpression, but is not required for its expression in B cells (see Baueret al, (2013) Science 343:253-257). The enhancer region was found withinintron 2 of the BCL11A gene, and three areas of DNAseI hypersensitivity(often indicative of a chromatin state that is associated withregulatory potential) in intron 2 were identified. These three areaswere identified as “+62”, “+58” and “+55” in accordance with thedistance in kilobases from the transcription start site of BCL11A. Theseenhancer regions are roughly 350 (+55); 550 (+58); and 350 (+62)nucleotides in length (Bauer 2013, ibid).

Thus, there remains a need for additional methods and compositions thatcan utilize these genome wide association studies for genome editing andthe alteration of gene expression for example to treathemoglobinopathies such as sickle cell disease and beta thalassemia.

SUMMARY

The present invention describes compositions and methods for use in genetherapy and genome engineering. Specifically, the methods andcompositions described relate to inactivating (e.g., by completely orpartially abolishing its expression) a gene, for example a gene thatacts as regulator of one or more additional genes. In particular, theinvention describes methods and compositions for interfering withenhancer function in a BCL11A gene to diminish or knock out its activityin specific cell lineages. Additionally, the invention provides methodsand compositions for interfering with BCL11A enhancer functions whereinthe enhancer sequences are not located within the BCL11A gene. Theresulting down-regulation of the BCL11A gene in these circumstances inturn results in increased expression of gamma globin.

In some aspects, the invention comprises delivery of at least onenuclease (e.g., a nuclease that binds to a BCL11A enhancer sequence) toa human stem cell or precursor cell (HSC/PC) for the purpose of genomeengineering. In certain embodiments, the nuclease recognizes a targetsequence comprising at least 9 (e.g., 9, 10, 11, 12, 13, 14, 15, 16, 17,18, 19, 20, 21 or even more) contiguous base pairs of SEQ ID NO:1, SEQID NO:2 or SEQ ID NO:3). Exemplary target sequences are shown in Tables1, 2, 3, 4 and 6. In certain embodiments, the nuclease comprises aDNA-binding domain comprising A DNA-binding protein comprising a zincfinger protein comprising 4, 5 or 6 zinc finger domains comprising arecognition helix region, for example, the recognition helix regions inthe order shown in a single row of Table 3 or Table 6. In otherembodiments, the nuclease comprises a TALE protein comprising aplurality of TALE repeat units, each repeat unit comprising ahypervariable diresidue region (RVD), for example the RVDs of the TALErepeats units are shown in a single row of Table 1, Table 2 or Table 4.The nuclease(s) as described herein may further comprise a linker (e.g.,between the DNA-binding domain and the cleavage domain), for example alinker as shown in FIGS. 14 and 17.

In some embodiments, the nuclease is delivered as a peptide, while inothers it is delivered as a nucleic acid encoding the at least onenuclease. In some embodiments, more than one nuclease is used. In somepreferred embodiments, the nucleic acid encoding the nuclease is anmRNA, and in some instances, the mRNA is protected. In some aspects, themRNA may be chemically modified (See e.g. Kormann et al, (2011) NatureBiotechnology 29(2):154-157). In other aspects, the mRNA may comprise anARCA cap (see U.S. Pat. Nos. 7,074,596 and 8,153,773). In furtherembodiments, the mRNA may comprise a mixture of unmodified and modifiednucleotides (see U.S. Patent Publication 2012-0195936). The nuclease maycomprise a zinc finger nuclease (ZFN), a TALE-nuclease (TALEN) or aCRISPR/Cas nuclease system or a combination thereof. In a preferredembodiment, the nucleic acid encoding the nuclease(s) is delivered tothe HSC/PC via electroporation. In some embodiments, the nucleasecleaves at or near the binding site of transcription factor. In someaspects, the transcription factor is GATA-1.

In some embodiments comprising a nuclease system that utilizes a nucleicacid guide (e.g. CRISPR/Cas; TtAgo), the cell can be contacted with thenucleic acid guide at the same time as it is contacted with thenuclease, prior to contact with the nuclease, or after contact with thenuclease. The cell can be contacted where the nuclease is provided as apolypeptide, a mRNA or an vector (including a viral vector) capable ofexpression of the gene encoding the nuclease. The guide nucleic acidmaybe provided as an oligonucleotide (for TtAgo) or RNA (CRISPR/Cas).Further, guide RNA may be provided via an expression system forexpression of the guide RNA within the cell. In some aspects, more thanone guide RNA is provided (see Mandal et al (2014) Cell Stem Cell15:643). In some embodiments, two guide RNAs are provided, while inothers, more than two (e.g. three, four, five, six, seven, eight, nine,ten or more than ten) are provided. In some aspects, truncated guideRNAs are used to increase specificity (Fu et al (2014) Nature Biotechnol32(3): 279). Also see U.S. Patent Publication No. 20150056705.

In one aspect, the invention comprises mutated Cas nucleases specificfor a BCL11A enhancer. In some embodiments, these mutant Cas nucleasesare Cas9 nucleases, and have altered functionality. In some embodiments,the Cas9 protein is mutated in the HNH domain, rendering it unable tocleave the DNA strand that is complementary to the guide RNA. In otherembodiments, the Cas9 is mutated in the Rvu domain, making it incapableof cleaving the non-complimentary DNA strand. These mutations can resultin the creation of Cas9 nickases. In some embodiments, two Cas nickasesare used with two separate guide RNAs to target a DNA, which results intwo nicks in the target DNA at a specified distance apart. In otherembodiments, both the HNH and Rvu endonuclease domains are altered torender a Cas9 protein which is unable to cleave a target BCl11A enhancerDNA.

In another aspect, the methods and compositions of the inventioncomprise truncations of the Cas9 protein. In one embodiment, the Cas9protein is truncated such that one or more of the Cas9 functionaldomains are removed. In one embodiment, the removal of part or one ofthe nuclease domains renders the Cas nuclease a nickase. In oneembodiment, the Cas9 comprises only the domain responsible forinteraction with the crRNA or sgRNA and the target DNA.

In still further aspects, the methods and compositions of the inventionalso comprise fusion proteins wherein the Cas9 protein, or truncationthereof, is fused to a functional domain. In some aspects, thefunctional domain is an activation or a repression domain. In otheraspects, the functional domain is a nuclease domain. In someembodiments, the nuclease domain is a FokI endonuclease domain (e.g.Tsai (2014) Nature Biotech doi:10.1038/nbt.2908). In some embodiments,the FokI domain comprises mutations in the dimerization domain.

In other aspects, the invention comprises a cell or cell line in whichan endogenous BCL11A enhancer sequence is modified, for example ascompared to the wild-type sequence of the cell. The cell or cell linesmay be heterozygous or homozygous for the modification. Themodifications may comprise insertions, deletions and/or combinationsthereof. In some preferred embodiments, the insertions, deletions and/orcombinations thereof result in the destruction of a transcription factorbinding site. In certain embodiments, the BCL11A enhancer sequence ismodified by a nuclease (e.g., ZFN, TALEN, CRISPR/Cas system, Ttagosystem, etc.). In certain embodiments, the BCL11A enhancer is modifiedanywhere between exon 2 and exon 3. In other embodiments, the BCL11Aenhancer is modified in the regions shown in SEQ ID NO:1, SEQ ID NO:2 orSEQ ID NO:3 (FIG. 11). In certain embodiments, the modification is at ornear the nuclease(s) binding and/or cleavage site(s), for example,within 1-300 (or any value therebetween) base pairs upstream ordownstream of the site(s) of cleavage, more preferably within 1-100 basepairs (or any value therebetween) of either side of the binding and/orcleavage site(s), even more preferably within 1 to 50 base pairs (or anyvalue therebetween) on either side of the binding and/or cleavagesite(s). In certain embodiments, the modification is at or near the“+58” region of the BCL11A enhancer, for example, at or near a nucleasebinding site shown in any of SEQ ID NOs:4 to 80 and 276. In otherembodiments, the modification is at or near the “+55” region of theBCL11A enhancer, for example, at or near a nuclease site shown in any ofSEQ ID NOs:143 to 184 and 232-251. In still further embodiments, themodification occurs at other BCL11A enhancer sequences. Any cell or cellline may be modified, for example a stem cell (hematopoietic stem cell).Partially or fully differentiated cells descended from the modified stemcells as described herein are also provided (e.g., RBCs or RBC precursorcells). Any of the modified cells or cell lines disclosed herein mayshow increased expression of gamma globin. Compositions such aspharmaceutical compositions comprising the genetically modified cells asdescribed herein are also provided.

In other aspects, the invention comprises delivery of a donor nucleicacid to a target cell. The donor may be delivered prior to, after, oralong with the nucleic acid encoding the nuclease(s). The donor nucleicacid may comprise an exogenous sequence (transgene) to be integratedinto the genome of the cell, for example, an endogenous locus. In someembodiments, the donor may comprise a full length gene or fragmentthereof flanked by regions of homology with the targeted cleavage site.In some embodiments, the donor lacks homologous regions and isintegrated into a target locus through homology independent mechanism(i.e. NHEJ). The donor may comprise any nucleic acid sequence, forexample a nucleic acid that, when used as a substrate forhomology-directed repair of the nuclease-induced double-strand break,leads to a donor-specified deletion to be generated at the endogenouschromosomal locus (e.g., BCL11A enhancer region) or, alternatively (orin addition to), novel allelic forms of (e.g., point mutations thatablate a transcription factor binding site) the endogenous locus to becreated. In some aspects, the donor nucleic acid is an oligonucleotidewherein integration leads to a gene correction event, or a targeteddeletion.

In other aspects, the nuclease and/or donor is(are) delivered by viraland/or non-viral gene transfer methods. In preferred embodiments, thedonor is delivered to the cell via an adeno-associated virus (AAV). Insome instances, the AAV comprises LTRs that are of a heterologousserotype in comparison with the capsid serotype.

In some aspects, the methods and compositions of the invention compriseone or more nucleases (e.g., ZFNs and/or TALENs) targeted to specificregions in the BCL11A enhancer region. In some embodiments, the one ormore pairs of nucleases target sequences that result in the modificationof the enhancer region by deletion of it in its entirety, while in otherembodiments, subsections of the enhancer are deleted. In someembodiments, the deletion comprises one or more of the +55, +58 and/or+62 DNAseI hypersensitivity regions of the enhancer region. In otherembodiments, a subset (less than all) of the hypersensitive regions isdeleted. In some embodiments, only the +55, only the +58 or only the +62region is deleted. In other embodiments, two of the regions are deleted(e.g., +55 and +58; +58 and +62; or +55 and +62).

In some aspects, deletions comprising regions within the DNAseIhypersensitive regions of the enhancer are made. These deletions cancomprise from about 1 nucleotide to about 551 nucleotides. Thus, thedeletions can comprise, 1, 5, 10, 15, 20, 25, 30, 40, 50, 100, 150, 200,250, 300, 350, 400, 450, 500, 550 nucleotides, or any valuetherebetween. In some embodiments, the deletions comprise bindingregions for one or more transcription factors. In some preferredembodiments, the deletions comprise a GATA-1 binding site, or thebinding site for GATA-1 in combination with other factors.

Some aspects of the invention relate to engineered (non-natural) DNAbinding proteins that bind to the BCL11A enhancer sequence(s) but do notcleave it. In some embodiments, the Cas9 nuclease domain in a CRISPR/Cassystem can be specifically engineered to lose DNA cleavage activity(“dCAS”), and fused to a functional domain capable of modulating geneexpression (see Perez-Pimera (2013) Nat Method 10(10):973-976) to createa CRISPR/dCas-TF. In some instances, the engineered DNA binding domainsblock interaction of the transcription factors active in enhanceractivity from binding to their cognate enhancer sequences.

In some embodiments, the DNA binding domains are fused to a functionaldomain. Some aspects include fusion of the DNA binding domains withdomains capable of regulating the expression of a gene. In someembodiments, the fusion proteins comprise a DNA binding domain (zincfinger, TALE, CRISPR/dCas, TtaGo or other DNA binding domains that canbe engineered for binding specificity) fused to a gene expressionmodulatory domain where the modulator represses gene expression.

In some embodiments, the HSC/PC cells are contacted with the nucleasesand/or DNA binding proteins of the invention. In some embodiments, thenucleases and/or DNA binding proteins are delivered as nucleic acids andin other embodiments, they are delivered as proteins. In someembodiments, the nucleic acids are mRNAs encoding the nucleases and/orDNA binding proteins, and in further embodiments, the mRNAs may beprotected. In some embodiments, the mRNA may be chemically modified, maycomprise an ARCA cap and/or may comprise a mixture of unmodified andmodified nucleotides.

In some aspects, the HSC/PC are contacted with the nucleases and/or DNAbinding proteins of the inventions ex vivo, following apheresis of theHSC/PC from a subject, or purification from harvested bone marrow. Insome embodiments, the nucleases cause modifications within the BCL11Aenhancer regions. In further embodiments, the HSC/PC containing theBCL11A enhancer region modifications are introduced back into thesubject. In some instances, the HSC/PC containing the BCL11A enhancerregion modifications are expanded prior to introduction. In otheraspects, the genetically modified HSC/PC are given to the subject in abone marrow transplant wherein the HSC/PC engraft, differentiate andmature in vivo. In some embodiments, the HSC/PC are isolated from thesubject following G-CSF- and/or plerixafor-induced mobilization, and inothers, the cells are isolated from human bone marrow or human umbilicalcords. In some aspects, the subject is treated to a mild myeloablativeprocedure prior to introduction of the graft comprising the modifiedHSC/PC, while in other aspects, the subject is treated with a vigorousmyeloablative conditioning regimen. In some embodiments, the methods andcompositions of the invention are used to treat or prevent ahemoglobinopathy. In some aspects, the hemoglobinopathy is a betathalassemia, while in other aspects, the hemoglobinopathy is sickle celldisease.

In some embodiments, the HSC/PC are further contacted with a donormolecule. In some embodiments, the donor molecule is delivered by aviral vector. The donor molecule may comprise one or more sequencesencoding a functional polypeptide (e.g., a cDNA or fragment thereof),with or without a promoter. Additional sequences (coding or non-codingsequences) may be included when a donor molecule is used forinactivation, including but not limited to, sequences encoding a 2Apeptide, SA site, IRES, etc.

In one aspect, the methods and compositions of the invention comprisemethods for contacting the HSC/PC in vivo. The nucleases and/or DNAbinding proteins are delivered to HSC/PC in situ by methods known in theart. In some embodiments, the nucleases and/or DNA binding proteins ofthe invention comprise a viral particle that is administered to thesubject in need, while in others, the nucleases and/or DNA bindingproteins comprise a nanoparticle (e.g. liposome). In some embodiments,the viral particles and/or nanoparticles are delivered to the organ(e.g. bone marrow) wherein the HSC/PC reside.

In another aspect, described herein are methods of integrating a donornucleic acid into the genome of a cell via homology-independentmechanisms. The methods comprise creating a double-stranded break (DSB)in the genome of a cell and cleaving the donor molecule using anuclease, such that the donor nucleic acid is integrated at the site ofthe DSB. In certain embodiments, the donor nucleic acid is integratedvia non-homology dependent methods (e.g., NHEJ). As noted above, upon invivo cleavage the donor sequences can be integrated in a targeted mannerinto the genome of a cell at the location of a DSB. The donor sequencecan include one or more of the same target sites for one or more of thenucleases used to create the DSB. Thus, the donor sequence may becleaved by one or more of the same nucleases used to cleave theendogenous gene into which integration is desired. In certainembodiments, the donor sequence includes different nuclease target sitesfrom the nucleases used to induce the DSB. DSBs in the genome of thetarget cell may be created by any mechanism. In certain embodiments, theDSB is created by one or more zinc-finger nucleases (ZFNs), fusionproteins comprising a zinc finger binding domain, which is engineered tobind a sequence within the region of interest, and a cleavage domain ora cleavage half-domain. In other embodiments, the DSB is created by oneor more TALE DNA-binding domains (naturally occurring or non-naturallyoccurring) fused to a nuclease domain (TALEN). In yet furtherembodiments, the DSB is created using a CRISPR/Cas nuclease system wherean engineered single guide RNA or its functional equivalent is used toguide the nuclease to a targeted site in a genome.

In one aspect, the donor may encode a regulatory protein of interest(e.g. ZFP TFs, TALE TFs or a CRISPR/Cas TF) that binds to and/ormodulates expression of a gene of interest. In one embodiment, theregulatory proteins bind to a DNA sequence and prevent binding of otherregulatory factors. In another embodiment, the binding of a theregulatory protein may modulate (i.e. induce or repress) expression of atarget DNA.

In some embodiments, the transgenic HSC/PC cell and/or animal includes atransgene that encodes a human gene. In some instances, the transgenicanimal comprises a knock out at the endogenous locus corresponding toexogenous transgene, thereby allowing the development of an in vivosystem where the human protein may be studied in isolation. Suchtransgenic models may be used for screening purposes to identify smallmolecules or large biomolecules or other entities which may interactwith or modify the human protein of interest. In some aspects, thetransgene is integrated into the selected locus (e.g., safe-harbor) intoa stem cell (e.g., an embryonic stem cell, an induced pluripotent stemcell, a hematopoietic stem cell, etc.) or animal embryo obtained by anyof the methods described herein, and then the embryo is implanted suchthat a live animal is born. The animal is then raised to sexual maturityand allowed to produce offspring wherein at least some of the offspringcomprise edited endogenous gene sequence or the integrated transgene.

In another aspect, provided herein is a method of altering geneexpression (e.g., BCL11a and/or a globin gene) in a cell, the methodcomprising: introducing, into the cell, one or more nucleases asdescribed herein, under conditions such that the one or more proteinsare expressed and expression of the gene is altered. In certainembodiments, expression of a globin gene (e.g., gamma globin or betaglobin) is altered (e.g., increased). Any of the methods describedherein may further comprise integrating a donor sequence (e.g.,transgene or fragment thereof under the control of an exogenous orendogenous promoter) into the genome of the cell, for exampleintegrating a donor at or near the site of nuclease cleavage in theBCL11a gene. The donor sequence is introduced to the cell using a viralvector, as an oligonucleotide and/or on a plasmid. The cell in whichgene expression is altered may be, for example, a red blood cell (RBC)precursor cell and/or a hematopoietic stem cell (e.g., CD34+ cell).

In other embodiments, provided herein is a method of producing agenetically modified cell comprising a genomic modification within anendogenous BCL11a enhancer sequence, the method comprising the steps of:a) contacting a cell with a polynucleotide (e.g. DNA or mRNA) encoding azinc finger nuclease comprising 4, 5, or 6 zinc finger domains in whicheach of the zinc finger domains comprises a recognition helix region inthe order shown in a single row of Table 3 or Table 6; b) subjecting thecell to conditions conducive to expressing the zinc finger protein fromthe polynucleotide; and c) modifying the endogenous BCL11A enhancersequence with the expressed zinc finger protein sufficient to producethe genetically modified cell. In certain embodiments, the cells arestimulated with at least one cytokine (e.g., prior to step (a)). Thepolynucleotide may be contacted with the cell using any suitable method,including but not limited, via transfection, using a non-viral vector,using a viral vector, by chemical means or by exposure to an electricfield (e.g., electroporation).

Also provided is a method of treating a patient in need of an increasein globin gene expression, the method comprising administering to thepatient the pharmaceutical preparation as described herein in an amountsufficient to increase the globin gene expression in the patient. Incertain embodiments, the patient is known to have, is suspected ofhaving, or is at risk of developing a thalassemia or sickle celldisease.

A kit, comprising the nucleic acids, proteins and/or cells of theinvention, is also provided. The kit may comprise nucleic acids encodingthe nucleases, (e.g. RNA molecules or ZFN, TALEN or CRISPR/Cas systemencoding genes contained in a suitable expression vector), or aliquotsof the nuclease proteins, donor molecules, suitable stemness modifiers,cells, buffers, and/or instructions (e.g., for performing the methods ofthe invention) and the like.

The invention therefore includes, but is not limited to the followingembodiments:

1. A genetically modified cell comprising a genomic modification made bya nuclease, wherein the genomic modification is within an endogenousBCL11a enhancer sequence, and further wherein the genomic modificationis selected from the group consisting of insertions, deletions andcombinations thereof.

2. The genetically modified cell of embodiment 1, wherein the genomicmodification is within one or more of the sequences shown in SEQ IDNO:1, 2 or 3.

3. The genetically modified cell of embodiment 2, wherein the genomicmodification is within at least 9 contiguous base pairs of SEQ ID NO:1,2 or 3.

4. The genetically modified cell of embodiment 2, wherein the genomicmodification is within the +55 BCL11A enhancer sequence (SEQ ID NO:1).

5. The genetically modified cell of embodiment 4, wherein the genomicmodification is at or near any of the sequences shown as SEQ ID Nos. 143to 184 and 232-251.

6. The genetically modified cell of embodiment 2, wherein the genomicmodification is within the +58 BCL11A enhancer sequence (SEQ ID NO:2).

7. The genetically modified cell of embodiment 6, wherein the genomicmodification is at or near any of the sequences shown as SEQ ID Nos. 4to 80 and 276.

8. The genetically modified cell of embodiment 2, wherein the genomicmodification is within the +62 BCL11A enhancer sequence (SEQ ID NO:3)

9. The genetically modified cell of any of embodiments 1 to 8, whereinthe cell is a stem cell.

10. The genetically modified cell of embodiment 9, wherein the stem cellis a hematopoietic stem cell.

11. The genetically modified cell of embodiment 10, wherein thehematopoietic stem cell is a CD34+ cell.

12. A genetically modified differentiated cell descended from the stemcell of any of embodiments 1 to 11.

13. The genetically modified cell of embodiment 12, wherein the cell isa red blood cell (RBC).

14. The genetically modified cell of any of embodiments 1 to 13, whereinthe nuclease comprises at least one zinc finger nuclease (ZFN) or TALEN.

15. The genetically modified cell of any of embodiments 1 to 14, whereinthe nuclease is introduced into the cell as a polynucleotide.

16. The genetically modified cell of any of embodiments 1 to 15, whereinthe insertion comprises integration of a donor polynucleotide encoding atransgene.

17. The genetically modified cell of any of embodiments 14 to 16,wherein the nuclease comprises a zinc finger nuclease, the zinc fingernuclease comprising 4, 5, or 6 zinc finger domains comprising arecognition helix and further wherein the zinc finger proteins comprisethe recognition helix regions in the order shown in a single row ofTable 3 or Table 6.

18. The genetically modified cell of any of embodiments 14 to 16,wherein the nuclease comprises a TALEN, the TALEN comprising a pluralityof TALE repeat units, each repeat unit comprising a hypervariablediresidue region (RVD), wherein the RVDs of the TALE repeats units areshown in a single row of Table 1, Table 2 or Table 4.

19. A pharmaceutical composition comprising the genetically modifiedcell of any of embodiments 1 to 18.

20. A DNA-binding protein comprising a zinc finger protein or aTALE-effector protein (TALE), wherein

(i) the zinc finger protein comprises 4, 5 or 6 zinc finger domainscomprising a recognition helix region, wherein the zinc finger proteinscomprise the recognition helix regions in the order shown in a singlerow of Table 3 or Table 6; and

(ii) the TALE protein comprising a plurality of TALE repeat units, eachrepeat unit comprising a hypervariable diresidue region (RVD), whereinthe RVDs of the TALE repeats units are shown in a single row of Table 1,Table 2 or Table 4.

21. A fusion protein comprising a zinc finger protein or TALE protein ofembodiment 20 and a wild-type or engineered cleavage domain or cleavagehalf-domain.

22. A polynucleotide encoding one or more proteins of embodiment 20 orembodiment 21.

23. An isolated cell comprising one or more proteins according toembodiment 20 or embodiment 21.

24. An isolated cell comprising one or more polynucleotides according toembodiment 22.

25. The cell of embodiment 23 or embodiment 24, wherein the cell is ahematopoietic stem cell.

26. A kit comprising at least one of: i) a polynucleotide encoding theprotein according to embodiment 20 or embodiment 21 or ii) a proteinaccording to embodiment 20 or embodiment 21.

27. A method of altering globin gene expression in a cell, the methodcomprising:

introducing, into the cell, one or more polynucleotides according toembodiment 22, under conditions such that the one or more proteins areexpressed and expression of the globin gene is altered.

28. The method of embodiment 27, wherein expression of the globin geneis increased.

29. The method of embodiment 27 or embodiment 28, wherein the globingene is a gamma globin or beta globin gene.

30. The method of any of embodiments 27 to 29, further comprisingintegrating a donor sequence into the genome of the cell.

31. The method of embodiment 30, wherein the donor sequence isintroduced to the cell using a viral vector, as an oligonucleotide or ona plasmid.

32. The method of any of embodiments 27 to 31, wherein the cell isselected from the group consisting of a red blood cell (RB C) precursorcell and a hematopoietic stem cell.

33. The method of any of embodiments 30 to 32, wherein the donorsequence comprises a transgene under the control of an endogenous orexogenous promoter.

34. A method of producing a genetically modified cell comprising agenomic modification within an endogenous BCL11a enhancer sequence, themethod comprising the steps of:

a) contacting a cell with a polynucleotide encoding a fusion proteincomprising a zinc finger nuclease comprising 4, 5, or 6 zinc fingerdomains in which each of the zinc finger domains comprises a recognitionhelix region in the order shown in a single row of Table 3 or Table 6,

b) subjecting the cell to conditions conducive to expressing the fusionprotein from the polynucleotide; and

c) modifying the endogenous BCL11A enhancer sequence with the expressedfusion protein sufficient to produce the genetically modified cell.

35. The method of embodiment 34, wherein the method further comprisesstimulating the cells with at least one cytokine.

36. The method of embodiment 34 or embodiment 35, wherein the methodfurther comprises the step of delivering the polynucleotide inside thecell.

37. The method of embodiment 36, wherein the delivery step comprises useof at least one of a non-viral delivery system, a viral delivery system,and a delivery vehicle.

38. The method of any of embodiments 34 to 37, wherein the delivery stepfurther comprises subjecting the cells to an electric field.

39. A kit for performing the method of any of embodiments 34 to 37, thekit comprising:

a) at least one polynucleotide encoding a fusion protein comprising azinc finger nuclease comprising 4, 5, or 6 zinc finger domains in whicheach of the zinc finger domains comprises a recognition helix region inthe order shown in a single row of Table 3 or Table 6,

b) at least one polynucleotide encoding a TALE protein comprising aplurality of TALE repeat units, each repeat unit comprising ahypervariable diresidue region (RVD), wherein the RVDs of the TALErepeats units are shown in a single row of Table 1, Table 2 or Table 4;and optionally,

c) directions for using the kit.

40. A method of treating a patient in need of an increase in globin geneexpression, the method comprising administering to the patient thepharmaceutical preparation of embodiment 19 in an amount sufficient toincrease the globin gene expression in the patient.

41. The method of embodiment 40, wherein the patient is known to have,is suspected of having, or is at risk of developing a globinopathy.

42. The method of embodiment 41, wherein the globinopathy is athalassemia or sickle cell disease.

43. The method of embodiment 42, wherein the thalassemia isβ-thalassemia.

These and other aspects will be readily apparent to the skilled artisanin light of disclosure as a whole.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts a diagram of the BCL11A coding region, indicating theposition of the introns and of the enhancer regions. The derivations ofthe differing splicing products are also indicated (see Liu et al.(2006) Molecular Cancer 5:18).

FIG. 2 depicts the genomic region that encodes the various Bcl11aisoforms (University of California Santa Cruz genome browser,coordinates listed in the hg19 assembly of the human genome), theenhancer region in the BCL11A intron 2 (coordinates listed in the hg19assembly of the human genome), and defines the three subregions of theenhancer region as represented by DNAse I hypersensitive sites (listedin kb by approximate distance from the transcription start site): +55,+58 and +62. Nuclease target locations within the three subregions areindicated as follows: nucleases were designed to cleave on the left endof each subregion (the ‘L’ sites), in the middle of each (the ‘M’ sites)and on the right end (the ‘R’ sites). Cleavage of the locus in vivo witha pair of nucleases results in a deletion of the intervening region in asignificant fraction of the cells.

FIG. 3 depicts the results of in-cell cleavage by the TALEN pair sets(indicated in the table in the upper panel of the figure) in the BCL11Aenhancer region as gauged by a PCR-based assay for deletion of variousregions of the enhancer. Pairs of TALENs used were designed to createdeletions in human HSPCs either within the +55 region (“55L-R”), withinthe +62 region (“62L-R), or within the +58 region (“58L-R”, “58M-R”).The gel shows PCR products produced by isolating genomic DNA from humanHSPCs transfected with expression constructs encoding the indicatedTALENs and amplifying the region surrounding the target region. Thesedata demonstrate the generation of deletions (band indicated by thesymbol A) in the targeted region following cleavage by the TALEN pairsets indicated.

FIG. 4 is a graph showing results from a real-time RT-qPCR (“Taqman®”)analysis designed to detect a change in expression of fetal gamma-globinmRNA following targeted editing of the BCL11A enhancer. Followingelectroporation of CD34 cells from healthy human volunteers with mRNAsencoding the designated nucleases (see FIG. 3), erythrocytes weregenerated in vitro, after which total RNA was harvested. The relativelevels of alpha globin and gamma globin mRNA for each sample weredetermined in an RT-PCR Taqman® analysis, and the relative ratio ofgamma globin mRNA/alpha globin mRNA was plotted. Thus, increasing gammaglobin expression in the nuclease-treated samples leads to an increasein the normalized gamma/alpha ratio compared to the controls. The Figuredisplays the results from treating CD34 cells with single TALEN pairsand for the TALEN pair sets described in FIGS. 2 and 3. The level ofgamma/alpha is increased for the 58L-R and 58M-R pair sets. Note thatthe ratio in the GFP transfection control was 3.4 in this experiment.

FIG. 5 is a graph showing results from a real-time RT-qPCR (“Taqman®”)analysis designed to detect a change in expression of fetal gamma-globinmRNA following targeted editing of the BCL11A enhancer. Followingelectroporation of CD34 cells from healthy human volunteers with mRNAsencoding the designated nucleases (see FIG. 3), erythrocytes weregenerated in vitro, after which total RNA was harvested. The relativelevels of beta globin and gamma globin mRNA for each sample weredetermined in an RT-PCR Taqman® analysis, and the relative ratio ofgamma globin mRNA/beta globin mRNA was plotted. The results demonstratethat while the specific single TALEN pairs used in these experimentswere not able to induce a change in gamma globin expression, creation ofdeletions by use of the pair sets caused an increase in relative gammaexpression, corrected by adult beta globin expression in this instance.In particular, deletion of the DNA sequenced encompassed by the +55 orof the +58 DNAse I hypersensitive elevated gamma globin, while suchelevation following a deletion of the sequence encompassed by the +62DNAse I hypersensitive site was not detected in this experiment. Notethat the ratio in the GFP transfection control was 0.5 in thisexperiment.

FIG. 6 shows the results from a Taqman® analysis for fetal globin levelsas described in FIG. 4. In this set of experiments, the levels of betaglobin and gamma globin mRNA were measured, and thus the data depictedis the ratio of gamma to beta and compared to the same gamma/beta ratioin control treated cells. A series of single TALEN pairs were used to“walk across” the +58 region of the BCL11A enhancer. In contrast to theearlier experiments where a deletion of 400-900 base pairs was requiredto see an increase in gamma expression, the results depicted in thisexperiment demonstrated a single site (indicated by an arrow) that whencleaved caused that relative increase. Results from the deletion pairsare included in this graph for comparison. See FIG. 7 for location ofcleavage sites of the TALEN pairs across the region of interest.

FIG. 7 shows the results from a Taqman® analysis as described in FIG. 4using ZFN pairs targeted to the +58 enhancer region. The levels of betaglobin and gamma globin were characterized and used to express the ratioof gamma to beta-globin compared to the same ratio in control treatedcells. The data demonstrates that ZFN-driven disruption of the sameregion identified in the TALEN screen (FIG. 6 and see FIG. 8 below))resulted in increased gamma globin expression. Note that the ratio inthe GFP transfection control was 2.6 in this experiment.

FIG. 8 shows a representation of the binding sites of the +58 enhancerregion specific TALENs (102852 and 102853) and ZFNs (45843 and 45844),use of which in human HSPCs increases the relative expression of gammaglobin following in vitro erythropoiesis. The sequence shown is thedouble stranded form of the DNA sequence encompassing the +58 region ofthe BCL11A enhancer (SEQ ID NO:264), and the numbering system relates tothe +58 itself. Also indicated in the figure is the location of a matchto the binding site of the GATA-1-transcription factor (sequence oflocus, gtGATAAag, consensus GATA-1 site—swGATAAvv). Additionally, thecleavage sites of the TALEN pairs used in the +58 “walk” are indicatedwhere the numbers correspond to the samples used in the data setspresented in FIG. 6.

FIG. 9 shows the results from a Taqman® analysis as described in FIG. 4where a series of TALENs were made to target the +55 region of theBCL11A enhancer. In this set of experiments, the levels of beta globinand gamma globin were characterized, and thus the data depicted is theratio of gamma to beta and compared to the same ratio in control treatedcells (which, in this experiment, was 0.8). The data confirm thatmutations generated at specific positions within the +55 region canincrease relative gamma globin expression (see arrows).

FIG. 10 is a representation of the cleavage sites of the +55 enhancerregion specific TALENs as shown in FIG. 9. The sequence shown is thedouble stranded form of the DNA sequence encompassing the +55 region ofthe BCL11A enhancer (SEQ ID NO:254). The numbers that highlight shortregions of nucleotides indicate the likely cleavage sites induced by theTALEN samples listed in FIG. 10. Also indicated in FIG. 10 are twomatches to the consensus binding site for the GATA-1 transcriptionfactor.

FIGS. 11A to 11C display the DNA sequence of the three DNAse Ihypersensitive sites within the BCL11A enhancer sequence. Because theiridentification was performed by probing regions of accessible chromatinin cells (see Bauer et al, (2013) ibid), the exact boundaries of theregions are not known and approximate boundaries are shown. FIG. 11Ashows the sequence of the +55 region (SEQ ID NO:1), FIG. 11B shows thesequence of the +58 region (SEQ ID NO:2) and FIG. 11C shows the sequenceof the +62 region (SEQ ID NO:3).

FIGS. 12A to 12C demonstrate that ZFN-driven cleavage in cells closer tothe core of the GATA-1 consensus elevates fetal globin levels to an evengreater extent than cleavage closer to the 3′ end of the motif. FIG. 12Adisplays a diagram depicting the binding sites of the Bcl11A-specificZFN pairs in relation to the GATA-1 consensus sequence (FIG. 12A) anddepicts a DNA sequence within the +58 region comprising the GATA-1consensus sequence (SEQ ID NO:255). Bars above and below the DNAsequence indicate the binding sites of the ZFNs. FIG. 12B shows therelative expression of gamma globin and beta globin as measured by mRNAexpression following of human HSPCs with mRNA encoding the indicatedZFNs (see FIG. 12A), followed by in vitro erythropoiesis and measurementof levels of fetal globin (see, FIG. 4). The ratio observed when a GFPexpressing mRNA was transfected into the CD34+ cells was 0.97. FIG. 12Crepresents in “pie chart” form the allelic forms of the BCL11A enhancer(specifically, the region cleaved by the ZFNs shown in FIG. 12A) foundin human HSPCs following electroporation with the indicated ZFNs. Whilecomparable levels of unmodified (wild-type) chromatids are observed inthe two samples, the sample treated with ZFNs that cut closer to theGATA-1 motif contain a greater number of chromatids that eliminate theGATA-1 consensus (e.g., the “−15” allele, which represents a deletion of15 base pairs). The data demonstrates that cleavage by the two ZFN pairsthat are closer to the center of the GATA-1 consensus sequence (pairs46801/46880 and 46923/46999) is associated with increased gamma globinexpression.

FIG. 13 demonstrates that altering the linker between the zinc fingerand FokI moiety in ZFNs used for genome editing of the BCL11A enhanceraffects fetal globin levels following in vitro erythropoiesis despitecomparable levels disrupted chromatids. Human HSPCs were electroporatedwith the indicated ZFNs (the linker used in each ZFN monomer isindicated in parentheses), and immediately prior and following in vitroerythroid differentiation, the % of disrupted alleles was measured(shown below each sample in “X/Y” form, with the first numbercorresponding to % of non-wild-type indels following electroporation,and the second number show results following 14 days of in vitroerythroid differentiation). Whole mRNA was harvested and the levels offetal globin (normalized to alpha globin) were measured.

FIG. 14 depicts the amino acid and DNA sequences for four linkers (L0(SEQ ID NO:256 and 257), L7a (SEQ ID NO:258 and 259), L7c5 (SEQ IDNO:260 and 261) and L8c5 (SEQ ID NO:262 and 263) used in the ZFPdesigns. Sequences with a solid underline indicate the carboxy terminalregion of the ZFP DNA binding domain, while sequences indicated with thedashed underline indicate the amino terminal region of the Fok Inuclease domain. Sequences in bold indicate the novel sequences added tothe standard L0 linker.

FIGS. 15A and 15B depict the percent of cells of human origin, andtargeted genetic modification at the nuclease target site in these humancells, respectively found in the peripheral blood of mice followingedited human CD34+ cell transplant. FIG. 15A is a graph depicting thepercent human cells in the mouse periphery following transplantation ofhuman CD34+ cells that had been edited with two different sets of ZFN 4weeks post-transplant. FIG. 15B depicts the level of indels detected inthose human cells. Each symbol represents data obtained from anindividual mouse.

FIGS. 16A through 16D, depict the percentage of indels induced by thenucleases in human cells that differentiated from the originaltransplanted CD34+ cells 16 weeks post transplantation. FIG. 16A showsthe level of indel activity in pan-myeloid cells, identified by thepresence of the CD33 marker. FIG. 16B shows the activity in CD19+ Bcells. FIG. 16C shows the activity in glyA+ or erythroid cells, whileFIG. 16D shows the activity in stem cells. Each symbol represents dataobtained from an individual mouse.

FIG. 17 shows a series of linker sequences (SEQ ID NOS 265-275,respectively, in order of appearance). These linkers can serve to linkthe zinc finger DNA binding domain to the FokI nuclease domain.

DETAILED DESCRIPTION

Disclosed herein are compositions and methods for genome engineering forthe modulation of BCL11A and/or gamma globin expression and for thetreatment and/or prevention of hemoglobinopathies. In particular,nuclease-mediated (i.e. ZFN, TALEN or CRISPR/Cas or TtAgo system)targeted deletion of specific sites in a BCL11A enhancer region isefficiently achieved in HSC/PC and results in a change in relative gammaglobin expression during subsequent erythropoiesis. This modulation ofBCL11A and gamma globin expression is particularly useful for treatmentof hemoglobinopathies (e.g., beta thalassemias, sickle cell disease)wherein there is insufficient beta globin expression or expression of amutated form of beta-globin. Using the methods and compositions of theinvention, the complications and disease related sequelae caused by theaberrant beta globin can be overcome by alteration of the expression ofgamma globin in erythrocyte precursor cells.

General

Practice of the methods, as well as preparation and use of thecompositions disclosed herein employ, unless otherwise indicated,conventional techniques in molecular biology, biochemistry, chromatinstructure and analysis, computational chemistry, cell culture,recombinant DNA and related fields as are within the skill of the art.These techniques are fully explained in the literature. See, forexample, Sambrook et al. MOLECULAR CLONING: A LABORATORY MANUAL, Secondedition, Cold Spring Harbor Laboratory Press, 1989 and Third edition,2001; Ausubel et al., CURRENT PROTOCOLS IN MOLECULAR BIOLOGY, John Wiley& Sons, New York, 1987 and periodic updates; the series METHODS INENZYMOLOGY, Academic Press, San Diego; Wolfe, CHROMATIN STRUCTURE ANDFUNCTION, Third edition, Academic Press, San Diego, 1998; METHODS INENZYMOLOGY, Vol. 304, “Chromatin” (P. M. Wassarman and A. P. Wolffe,eds.), Academic Press, San Diego, 1999; and METHODS IN MOLECULARBIOLOGY, Vol. 119, “Chromatin Protocols” (P. B. Becker, ed.) HumanaPress, Totowa, 1999.

Definitions

The terms “nucleic acid,” “polynucleotide,” and “oligonucleotide” areused interchangeably and refer to a deoxyribonucleotide orribonucleotide polymer, in linear or circular conformation, and ineither single- or double-stranded form. For the purposes of the presentdisclosure, these terms are not to be construed as limiting with respectto the length of a polymer. The terms can encompass known analogues ofnatural nucleotides, as well as nucleotides that are modified in thebase, sugar and/or phosphate moieties (e.g., phosphorothioatebackbones). In general, an analogue of a particular nucleotide has thesame base-pairing specificity; i.e., an analogue of A will base-pairwith T.

The terms “polypeptide,” “peptide” and “protein” are usedinterchangeably to refer to a polymer of amino acid residues. The termalso applies to amino acid polymers in which one or more amino acids arechemical analogues or modified derivatives of a correspondingnaturally-occurring amino acids.

“Binding” refers to a sequence-specific, non-covalent interactionbetween macromolecules (e.g., between a protein and a nucleic acid). Notall components of a binding interaction need be sequence-specific (e.g.,contacts with phosphate residues in a DNA backbone), as long as theinteraction as a whole is sequence-specific. Such interactions aregenerally characterized by a dissociation constant (K_(d)) of 10⁻⁶ M⁻¹or lower. “Affinity” refers to the strength of binding: increasedbinding affinity being correlated with a lower K_(d).

A “binding protein” is a protein that is able to bind to anothermolecule. A binding protein can bind to, for example, a DNA molecule (aDNA-binding protein), an RNA molecule (an RNA-binding protein) and/or aprotein molecule (a protein-binding protein). In the case of aprotein-binding protein, it can bind to itself (to form homodimers,homotrimers, etc.) and/or it can bind to one or more molecules of adifferent protein or proteins. A binding protein can have more than onetype of binding activity. For example, zinc finger proteins haveDNA-binding, RNA-binding and protein-binding activity.

A “zinc finger DNA binding protein” (or binding domain) is a protein, ora domain within a larger protein, that binds DNA in a sequence-specificmanner through one or more zinc fingers, which are regions of amino acidsequence within the binding domain whose structure is stabilized throughcoordination of a zinc ion. The term zinc finger DNA binding protein isoften abbreviated as zinc finger protein or ZFP.

A “TALE DNA binding domain” or “TALE” is a polypeptide comprising one ormore TALE repeat domains/units. The repeat domains are involved inbinding of the TALE to its cognate target DNA sequence. A single “repeatunit” (also referred to as a “repeat”) is typically 33-35 amino acids inlength and exhibits at least some sequence homology with other TALErepeat sequences within a naturally occurring TALE protein.

Zinc finger and TALE binding domains can be “engineered” to bind to apredetermined nucleotide sequence, for example via engineering (alteringone or more amino acids) of the recognition helix region of a naturallyoccurring zinc finger or TALE protein. Therefore, engineered DNA bindingproteins (zinc fingers or TALEs) are proteins that are non-naturallyoccurring. Non-limiting examples of methods for engineering DNA-bindingproteins are design and selection. A designed DNA binding protein is aprotein not occurring in nature whose design/composition resultsprincipally from rational criteria. Rational criteria for design includeapplication of substitution rules and computerized algorithms forprocessing information in a database storing information of existing ZFPand/or TALE designs and binding data. See, for example, U.S. Pat. Nos.6,140,081; 6,453,242; 6,534,261 and 8,585,526; see also WO 98/53058; WO98/53059; WO 98/53060; WO 02/016536 and WO 03/016496.

A “selected” zinc finger protein or TALE is a protein not found innature whose production results primarily from an empirical process suchas phage display, interaction trap or hybrid selection. See e.g., U.S.Pat. Nos. 5,789,538; 5,925,523; 6,007,988; 6,013,453; 6,200,759;8,586,526; WO 95/19431; WO 96/06166; WO 98/53057; WO 98/54311; WO00/27878; WO 01/60970 WO 01/88197, WO 02/099084.

“TtAgo” is a prokaryotic Argonaute protein thought to be involved ingene silencing. TtAgo is derived from the bacteria Thermus thermophilus.See, e.g., Swarts et al, ibid, G. Sheng et al., (2013) Proc. Natl. Acad.Sci. U.S.A. 111, 652). A “TtAgo system” is all the components requiredincluding, for example, guide DNAs for cleavage by a TtAgo enzyme.

“Recombination” refers to a process of exchange of genetic informationbetween two polynucleotides, including but not limited to, donor captureby non-homologous end joining (NHEJ) and homologous recombination. Forthe purposes of this disclosure, “homologous recombination (HR)” refersto the specialized form of such exchange that takes place, for example,during repair of double-strand breaks in cells via homology-directedrepair mechanisms. This process requires nucleotide sequence homology,uses a “donor” molecule to template repair of a “target” molecule (i.e.,the one that experienced the double-strand break), and is variouslyknown as “non-crossover gene conversion” or “short tract geneconversion,” because it leads to the transfer of genetic informationfrom the donor to the target. Without wishing to be bound by anyparticular theory, such transfer can involve mismatch correction ofheteroduplex DNA that forms between the broken target and the donor,and/or “synthesis-dependent strand annealing,” in which the donor isused to resynthesize genetic information that will become part of thetarget, and/or related processes. Such specialized HR often results inan alteration of the sequence of the target molecule such that part orall of the sequence of the donor polynucleotide is incorporated into thetarget polynucleotide.

In the methods of the disclosure, one or more targeted nucleases asdescribed herein create a double-stranded break (DSB) in the targetsequence (e.g., cellular chromatin) at a predetermined site. The DSB mayresult in deletions and/or insertions by homology-directed repair or bynon-homology-directed repair mechanisms. Deletions may include anynumber of base pairs. Similarly, insertions may include any number ofbase pairs including, for example, integration of a “donor”polynucleotide, optionally having homology to the nucleotide sequence inthe region of the break. The donor sequence may be physically integratedor, alternatively, the donor polynucleotide is used as a template forrepair of the break via homologous recombination, resulting in theintroduction of all or part of the nucleotide sequence as in the donorinto the cellular chromatin. Thus, a first sequence in cellularchromatin can be altered and, in certain embodiments, can be convertedinto a sequence present in a donor polynucleotide. Thus, the use of theterms “replace” or “replacement” can be understood to representreplacement of one nucleotide sequence by another, (i.e., replacement ofa sequence in the informational sense), and does not necessarily requirephysical or chemical replacement of one polynucleotide by another.

In any of the methods described herein, additional pairs of zinc-fingerproteins or TALEN can be used for additional double-stranded cleavage ofadditional target sites within the cell.

Any of the methods described herein can be used for insertion of a donorof any size and/or partial or complete inactivation of one or moretarget sequences in a cell by targeted integration of donor sequencethat disrupts expression of the gene(s) of interest. Cell lines withpartially or completely inactivated genes are also provided.

In any of the methods described herein, the exogenous nucleotidesequence (the “donor sequence” or “transgene”) can contain sequencesthat are homologous, but not identical, to genomic sequences in theregion of interest, thereby stimulating homologous recombination toinsert a non-identical sequence in the region of interest. Thus, incertain embodiments, portions of the donor sequence that are homologousto sequences in the region of interest exhibit between about 80 to 99%(or any integer therebetween) sequence identity to the genomic sequencethat is replaced. In other embodiments, the homology between the donorand genomic sequence is higher than 99%, for example if only 1nucleotide differs as between donor and genomic sequences of over 100contiguous base pairs. In certain cases, a non-homologous portion of thedonor sequence can contain sequences not present in the region ofinterest, such that new sequences are introduced into the region ofinterest. In these instances, the non-homologous sequence is generallyflanked by sequences of 50-1,000 base pairs (or any integral valuetherebetween) or any number of base pairs greater than 1,000, that arehomologous or identical to sequences in the region of interest. In otherembodiments, the donor sequence is non-homologous to the first sequence,and is inserted into the genome by non-homologous recombinationmechanisms.

“Cleavage” refers to the breakage of the covalent backbone of a DNAmolecule. Cleavage can be initiated by a variety of methods including,but not limited to, enzymatic or chemical hydrolysis of a phosphodiesterbond. Both single-stranded cleavage and double-stranded cleavage arepossible, and double-stranded cleavage can occur as a result of twodistinct single-stranded cleavage events. DNA cleavage can result in theproduction of either blunt ends or staggered ends. In certainembodiments, fusion polypeptides are used for targeted double-strandedDNA cleavage.

A “cleavage half-domain” is a polypeptide sequence which, in conjunctionwith a second polypeptide (either identical or different) forms acomplex having cleavage activity (preferably double-strand cleavageactivity). The terms “first and second cleavage half-domains;” “+ and −cleavage half-domains” and “right and left cleavage half-domains” areused interchangeably to refer to pairs of cleavage half-domains thatdimerize.

An “engineered cleavage half-domain” is a cleavage half-domain that hasbeen modified so as to form obligate heterodimers with another cleavagehalf-domain (e.g., another engineered cleavage half-domain). See, also,U.S. Patent Publication Nos. 2005/0064474, 20070218528, 20080131962 and20110201055, incorporated herein by reference in their entireties.

The term “sequence” refers to a nucleotide sequence of any length, whichcan be DNA or RNA; can be linear, circular or branched and can be eithersingle-stranded or double stranded. The term “donor sequence” refers toa nucleotide sequence that is inserted into a genome. A donor sequencecan be of any length, for example between 2 and 100,000,000 nucleotidesin length (or any integer value therebetween or thereabove), preferablybetween about 100 and 100,000 nucleotides in length (or any integertherebetween), more preferably between about 2000 and 20,000 nucleotidesin length (or any value therebetween) and even more preferable, betweenabout 5 and 15 kb (or any value therebetween).

“Chromatin” is the nucleoprotein structure comprising the cellulargenome. Cellular chromatin comprises nucleic acid, primarily DNA, andprotein, including histones and non-histone chromosomal proteins. Themajority of eukaryotic cellular chromatin exists in the form ofnucleosomes, wherein a nucleosome core comprises approximately 150 basepairs of DNA associated with an octamer comprising two each of histonesH2A, H2B, H3 and H4; and linker DNA (of variable length depending on theorganism) extends between nucleosome cores. A molecule of histone H1 isgenerally associated with the linker DNA. For the purposes of thepresent disclosure, the term “chromatin” is meant to encompass all typesof cellular nucleoprotein, both prokaryotic and eukaryotic. Cellularchromatin includes both chromosomal and episomal chromatin.

A “chromosome,” is a chromatin complex comprising all or a portion ofthe genome of a cell. The genome of a cell is often characterized by itskaryotype, which is the collection of all the chromosomes that comprisethe genome of the cell. The genome of a cell can comprise one or morechromosomes.

An “episome” is a replicating nucleic acid, nucleoprotein complex orother structure comprising a nucleic acid that is not part of thechromosomal karyotype of a cell. Examples of episomes include plasmidsand certain viral genomes.

An “accessible region” is a site in cellular chromatin in which a targetsite present in the nucleic acid can be bound by an exogenous moleculewhich recognizes the target site. Without wishing to be bound by anyparticular theory, it is believed that an accessible region is one thatis not packaged into a nucleosomal structure. The distinct structure ofan accessible region can often be detected by its sensitivity tochemical and enzymatic probes, for example, nucleases.

A “target site” or “target sequence” is a nucleic acid sequence thatdefines a portion of a nucleic acid to which a binding molecule willbind, provided sufficient conditions for binding exist.

An “exogenous” molecule is a molecule that is not normally present in acell, but can be introduced into a cell by one or more genetic,biochemical or other methods. “Normal presence in the cell” isdetermined with respect to the particular developmental stage andenvironmental conditions of the cell. Thus, for example, a molecule thatis present only during embryonic development of muscle is an exogenousmolecule with respect to an adult muscle cell. Similarly, a moleculeinduced by heat shock is an exogenous molecule with respect to anon-heat-shocked cell. An exogenous molecule can comprise, for example,a functioning version of a malfunctioning endogenous molecule or amalfunctioning version of a normally-functioning endogenous molecule.

An exogenous molecule can be, among other things, a small molecule, suchas is generated by a combinatorial chemistry process, or a macromoleculesuch as a protein, nucleic acid, carbohydrate, lipid, glycoprotein,lipoprotein, polysaccharide, any modified derivative of the abovemolecules, or any complex comprising one or more of the above molecules.Nucleic acids include DNA and RNA, can be single- or double-stranded;can be linear, branched or circular; and can be of any length. Nucleicacids include those capable of forming duplexes, as well astriplex-forming nucleic acids. See, for example, U.S. Pat. Nos.5,176,996 and 5,422,251. Proteins include, but are not limited to,DNA-binding proteins, transcription factors, chromatin remodelingfactors, methylated DNA binding proteins, polymerases, methylases,demethylases, acetylases, deacetylases, kinases, phosphatases,integrases, recombinases, ligases, topoisomerases, gyrases andhelicases.

An exogenous molecule can be the same type of molecule as an endogenousmolecule, e.g., an exogenous protein or nucleic acid. For example, anexogenous nucleic acid can comprise an infecting viral genome, a plasmidor episome introduced into a cell, or a chromosome that is not normallypresent in the cell. Methods for the introduction of exogenous moleculesinto cells are known to those of skill in the art and include, but arenot limited to, lipid-mediated transfer (i.e., liposomes, includingneutral and cationic lipids), electroporation, direct injection, cellfusion, particle bombardment, calcium phosphate co-precipitation,DEAE-dextran-mediated transfer and viral vector-mediated transfer. Anexogenous molecule can also be the same type of molecule as anendogenous molecule but derived from a different species than the cellis derived from. For example, a human nucleic acid sequence may beintroduced into a cell line originally derived from a mouse or hamster.Methods for the introduction of exogenous molecules into plant cells areknown to those of skill in the art and include, but are not limited to,protoplast transformation, silicon carbide (e.g., WHISKERS™)Agrobacterium-mediated transformation, lipid-mediated transfer (i.e.,liposomes, including neutral and cationic lipids), electroporation,direct injection, cell fusion, particle bombardment (e.g., using a “genegun”), calcium phosphate co-precipitation, DEAE-dextran-mediatedtransfer and viral vector-mediated transfer.

By contrast, an “endogenous” molecule is one that is normally present ina particular cell at a particular developmental stage under particularenvironmental conditions. For example, an endogenous nucleic acid cancomprise a chromosome, the genome of a mitochondrion, chloroplast orother organelle, or a naturally-occurring episomal nucleic acid.Additional endogenous molecules can include proteins, for example,transcription factors and enzymes.

As used herein, the term “product of an exogenous nucleic acid” includesboth polynucleotide and polypeptide products, for example, transcriptionproducts (polynucleotides such as RNA) and translation products(polypeptides).

A “fusion” molecule is a molecule in which two or more subunit moleculesare linked, preferably covalently. The subunit molecules can be the samechemical type of molecule, or can be different chemical types ofmolecules. Examples of the first type of fusion molecule include, butare not limited to, fusion proteins (for example, a fusion between a ZFPor TALE DNA-binding domain and one or more activation domains) andfusion nucleic acids (for example, a nucleic acid encoding the fusionprotein described supra). Examples of the second type of fusion moleculeinclude, but are not limited to, a fusion between a triplex-formingnucleic acid and a polypeptide, and a fusion between a minor groovebinder and a nucleic acid.

Expression of a fusion protein in a cell can result from delivery of thefusion protein to the cell or by delivery of a polynucleotide encodingthe fusion protein to a cell, wherein the polynucleotide is transcribed,and the transcript is translated, to generate the fusion protein.Trans-splicing, polypeptide cleavage and polypeptide ligation can alsobe involved in expression of a protein in a cell. Methods forpolynucleotide and polypeptide delivery to cells are presented elsewherein this disclosure.

A “gene,” for the purposes of the present disclosure, includes a DNAregion encoding a gene product (see infra), as well as all DNA regionswhich regulate the production of the gene product, whether or not suchregulatory sequences are adjacent to coding and/or transcribedsequences. Accordingly, a gene includes, but is not necessarily limitedto, promoter sequences, terminators, translational regulatory sequencessuch as ribosome binding sites and internal ribosome entry sites,enhancers, silencers, insulators, boundary elements, replicationorigins, matrix attachment sites and locus control regions.

“Gene expression” refers to the conversion of the information, containedin a gene, into a gene product. A gene product can be the directtranscriptional product of a gene (e.g., mRNA, tRNA, rRNA, antisenseRNA, ribozyme, structural RNA or any other type of RNA) or a proteinproduced by translation of an mRNA. Gene products also include RNAswhich are modified, by processes such as capping, polyadenylation,methylation, and editing, and proteins modified by, for example,methylation, acetylation, phosphorylation, ubiquitination,ADP-ribosylation, myristilation, and glycosylation.

“Modulation” of gene expression refers to a change in the activity of agene. Modulation of expression can include, but is not limited to, geneactivation and gene repression. Genome editing (e.g., cleavage,alteration, inactivation, random mutation) can be used to modulateexpression. Gene inactivation refers to any reduction in gene expressionas compared to a cell that does not include a ZFP, TALE or CRISPR/Cassystem as described herein. Thus, gene inactivation may be partial orcomplete.

A “protected” mRNA is one in which the mRNA has been altered in somemanner to increase the stability or translation of the mRNA. Examples ofprotections include the use of replacement of up to 25% of the cytodineand uridine residues with 2-thiouridine (s2U) and 5-methylcytidine(m5C). The resulting mRNA exhibits less immunogenicity and morestability as compared with its unmodified counterpart. (see Karikó etal. ((2012), Molecular Therapy, Vol. 16, No. 11, pages 1833-1844). Otherchanges include the addition of a so-called ARCA cap, which increasesthe translationability of the in vitro produced mRNA (see U.S. Pat. No.7,074,596).

A “region of interest” is any region of cellular chromatin, such as, forexample, a gene or a non-coding sequence within or adjacent to a gene,in which it is desirable to bind an exogenous molecule. Binding can befor the purposes of targeted DNA cleavage and/or targeted recombination.A region of interest can be present in a chromosome, an episome, anorganellar genome (e.g., mitochondrial, chloroplast), or an infectingviral genome, for example. A region of interest can be within the codingregion of a gene, within transcribed non-coding regions such as, forexample, leader sequences, trailer sequences or introns, or withinnon-transcribed regions, either upstream or downstream of the codingregion. A region of interest can be as small as a single nucleotide pairor up to 2,000 nucleotide pairs in length, or any integral value ofnucleotide pairs.

“Eukaryotic” cells include, but are not limited to, fungal cells (suchas yeast), plant cells, animal cells, mammalian cells and human cells(e.g., T-cells).

The terms “operative linkage” and “operatively linked” (or “operablylinked”) are used interchangeably with reference to a juxtaposition oftwo or more components (such as sequence elements), in which thecomponents are arranged such that both components function normally andallow the possibility that at least one of the components can mediate afunction that is exerted upon at least one of the other components. Byway of illustration, a transcriptional regulatory sequence, such as apromoter, is operatively linked to a coding sequence if thetranscriptional regulatory sequence controls the level of transcriptionof the coding sequence in response to the presence or absence of one ormore transcriptional regulatory factors. A transcriptional regulatorysequence is generally operatively linked in cis with a coding sequence,but need not be directly adjacent to it. For example, an enhancer is atranscriptional regulatory sequence that is operatively linked to acoding sequence, even though they are not contiguous.

With respect to fusion polypeptides, the term “operatively linked” canrefer to the fact that each of the components performs the same functionin linkage to the other component as it would if it were not so linked.For example, with respect to a fusion polypeptide in which a ZFP, TALEor Cas DNA-binding domain is fused to an activation domain, the ZFP,TALE or Cas DNA-binding domain and the activation domain are inoperative linkage if, in the fusion polypeptide, the ZFP, TALE of CasDNA-binding domain portion is able to bind its target site and/or itsbinding site, while the activation domain is able to upregulate geneexpression. When a fusion polypeptide in which a ZFP, TALE or CasDNA-binding domain is fused to a cleavage domain, the ZFP, TALE or CasDNA-binding domain and the cleavage domain are in operative linkage if,in the fusion polypeptide, the ZFP, TALE or Cas DNA-binding domainportion is able to bind its target site and/or its binding site, whilethe cleavage domain is able to cleave DNA in the vicinity of the targetsite.

A “functional fragment” of a protein, polypeptide or nucleic acid is aprotein, polypeptide or nucleic acid whose sequence is not identical tothe full-length protein, polypeptide or nucleic acid, yet retains thesame function as the full-length protein, polypeptide or nucleic acid. Afunctional fragment can possess more, fewer, or the same number ofresidues as the corresponding native molecule, and/or can contain one ormore amino acid or nucleotide substitutions. Methods for determining thefunction of a nucleic acid (e.g., coding function, ability to hybridizeto another nucleic acid) are well-known in the art. Similarly, methodsfor determining protein function are well-known. For example, theDNA-binding function of a polypeptide can be determined, for example, byfilter-binding, electrophoretic mobility-shift, or immunoprecipitationassays. DNA cleavage can be assayed by gel electrophoresis. See Ausubelet al., supra. The ability of a protein to interact with another proteincan be determined, for example, by co-immunoprecipitation, two-hybridassays or complementation, both genetic and biochemical. See, forexample, Fields et al. (1989) Nature 340:245-246; U.S. Pat. No.5,585,245 and PCT WO 98/44350.

A “vector” is capable of transferring gene sequences to target cells.Typically, “vector construct,” “expression vector,” and “gene transfervector,” mean any nucleic acid construct capable of directing theexpression of a gene of interest and which can transfer gene sequencesto target cells. Thus, the term includes cloning, and expressionvehicles, as well as integrating vectors.

The terms “subject” and “patient” are used interchangeably and refer tomammals such as human patients and non-human primates, as well asexperimental animals such as rabbits, dogs, cats, rats, mice, and otheranimals. Accordingly, the term “subject” or “patient” as used hereinmeans any mammalian patient or subject to which the or stem cells of theinvention can be administered. Subjects of the present invention includethose that have been exposed to one or more chemical toxins, including,for example, a nerve toxin.

“Stemness” refers to the relative ability of any cell to act in a stemcell-like manner, i.e., the degree of toti-, pluri-, or oligopotentcyand expanded or indefinite self-renewal that any particular stem cellmay have.

Nucleases

Described herein are compositions, particularly nucleases, that areuseful for in vivo cleavage of a donor molecule carrying a transgene andnucleases for cleavage of the genome of a cell such that the transgeneis integrated into the genome in a targeted manner. In certainembodiments, one or more of the nucleases are naturally occurring. Inother embodiments, one or more of the nucleases are non-naturallyoccurring, i.e., engineered in the DNA-binding domain and/or cleavagedomain. For example, the DNA-binding domain of a naturally-occurringnuclease may be altered to bind to a selected target site (e.g., ameganuclease that has been engineered to bind to site different than thecognate binding site). In other embodiments, the nuclease comprisesheterologous DNA-binding and cleavage domains (e.g., zinc fingernucleases; TAL-effector domain DNA binding proteins; meganucleaseDNA-binding domains with heterologous cleavage domains).

A. DNA-Binding Domains

In certain embodiments, the composition and methods described hereinemploy a meganuclease (homing endonuclease) DNA-binding domain forbinding to the donor molecule and/or binding to the region of interestin the genome of the cell. Naturally-occurring meganucleases recognize15-40 base-pair cleavage sites and are commonly grouped into fourfamilies: the LAGLIDADG (SEQ ID NO: 287) family, the GIY-YIG family, theHis-Cyst box family and the HNH family. Exemplary homing endonucleasesinclude I-SceI, I-CeuI, PI-PspI, PI-Sce, I-SceIV, I-CsmI, I-PanI,I-SceII, I-PpoI, I-SceIII, I-CreI, I-TevI, I-TevII and I-TevIII. Theirrecognition sequences are known. See also U.S. Pat. Nos. 5,420,032;6,833,252; Belfort et al. (1997) Nucleic Acids Res. 25:3379-3388; Dujonet al. (1989) Gene 82:115-118; Perler et al. (1994) Nucleic Acids Res.22, 1125-1127; Jasin (1996) Trends Genet. 12:224-228; Gimble et al.(1996) J. Mol. Biol. 263:163-180; Argast et al. (1998) J. Mol. Biol.280:345-353 and the New England Biolabs catalogue.

In certain embodiments, the methods and compositions described hereinmake use of a nuclease that comprises an engineered (non-naturallyoccurring) homing endonuclease (meganuclease). The recognition sequencesof homing endonucleases and meganucleases such as I-SceI, I-CeuI,PI-PspI, PI-Sce, I-SceIV, I-CsmI, I-PanI, I-SceII, I-PpoI, I-SceIII,I-CreI, I-TevI, I-TevII and I-TevIII are known. See also U.S. Pat. Nos.5,420,032; 6,833,252; Belfort et al. (1997) Nucleic Acids Res.25:3379-3388; Dujon et al. (1989) Gene 82:115-118; Perler et al. (1994)Nucleic Acids Res. 22, 1125-1127; Jasin (1996) Trends Genet. 12:224-228;Gimble et al. (1996) J. Mol. Biol. 263:163-180; Argast et al. (1998) J.Mol. Biol. 280:345-353 and the New England Biolabs catalogue. Inaddition, the DNA-binding specificity of homing endonucleases andmeganucleases can be engineered to bind non-natural target sites. See,for example, Chevalier et al. (2002) Molec. Cell 10:895-905; Epinat etal. (2003) Nucleic Acids Res. 31:2952-2962; Ashworth et al. (2006)Nature 441:656-659; Paques et al. (2007) Current Gene Therapy 7:49-66;U.S. Patent Publication No. 20070117128. The DNA-binding domains of thehoming endonucleases and meganucleases may be altered in the context ofthe nuclease as a whole (i.e., such that the nuclease includes thecognate cleavage domain) or may be fused to a heterologous cleavagedomain.

In other embodiments, the DNA-binding domain of one or more of thenucleases used in the methods and compositions described hereincomprises a naturally occurring or engineered (non-naturally occurring)TAL effector DNA binding domain. See, e.g., U.S. Pat. No. 8,586,526,incorporated by reference in its entirety herein. The plant pathogenicbacteria of the genus Xanthomonas are known to cause many diseases inimportant crop plants. Pathogenicity of Xanthomonas depends on aconserved type III secretion (T3S) system which injects more than 25different effector proteins into the plant cell. Among these injectedproteins are transcription activator-like (TAL) effectors which mimicplant transcriptional activators and manipulate the plant transcriptome(see Kay et al (2007) Science 318:648-651). These proteins contain a DNAbinding domain and a transcriptional activation domain. One of the mostwell characterized TAL-effectors is AvrBs3 from Xanthomonas campestgrispv. Vesicatoria (see Bonas et al (1989) Mol Gen Genet 218: 127-136 andWO2010079430). TAL-effectors contain a centralized domain of tandemrepeats, each repeat containing approximately 34 amino acids, which arekey to the DNA binding specificity of these proteins. In addition, theycontain a nuclear localization sequence and an acidic transcriptionalactivation domain (for a review see Schornack S, et al (2006) J PlantPhysiol 163(3): 256-272). In addition, in the phytopathogenic bacteriaRalstonia solanacearum two genes, designated brg11 and hpx17 have beenfound that are homologous to the AvrBs3 family of Xanthomonas in the R.solanacearum biovar 1 strain GMI1000 and in the biovar 4 strain RS1000(See Heuer et al (2007) Appl and Envir Micro 73(13): 4379-4384). Thesegenes are 98.9% identical in nucleotide sequence to each other butdiffer by a deletion of 1,575 base pairs in the repeat domain of hpx17.However, both gene products have less than 40% sequence identity withAvrBs3 family proteins of Xanthomonas. See, e.g., U.S. Pat. No.8,586,526, incorporated by reference in its entirety herein.

Specificity of these TAL effectors depends on the sequences found in thetandem repeats. The repeated sequence comprises approximately 102 basepairs (bp) and the repeats are typically 91-100% homologous with eachother (Bonas et al, ibid). Polymorphism of the repeats is usuallylocated at positions 12 and 13 and there appears to be a one-to-onecorrespondence between the identity of the hypervariable diresidues(RVD) at positions 12 and 13 with the identity of the contiguousnucleotides in the TAL-effector's target sequence (see Moscou andBogdanove, (2009) Science 326:1501 and Boch et al (2009) Science326:1509-1512). Experimentally, the natural code for DNA recognition ofthese TAL-effectors has been determined such that an HD sequence atpositions 12 and 13 leads to a binding to cytosine (C), NG binds to T,NI to A, C, G or T, NN binds to A or G, and ING binds to T. These DNAbinding repeats have been assembled into proteins with new combinationsand numbers of repeats, to make artificial transcription factors thatare able to interact with new sequences and activate the expression of anon-endogenous reporter gene in plant cells (Boch et al, ibid).Engineered TAL proteins have been linked to a FokI cleavage half domainto yield a TAL effector domain nuclease fusion (TALEN) exhibitingactivity in a yeast reporter assay (plasmid based target). See, e.g.,U.S. Pat. No. 8,586,526; Christian et al ((2010)<Genetics epub10.1534/genetics.110.120717). In certain embodiments, TALE domaincomprises an N-cap and/or C-cap as described in U.S. Pat. No. 8,586,526.

In certain embodiments, the DNA binding domain of one or more of thenucleases used for in vivo cleavage and/or targeted cleavage of thegenome of a cell comprises a zinc finger protein. Preferably, the zincfinger protein is non-naturally occurring in that it is engineered tobind to a target site of choice. See, for example, See, for example,Beerli et al. (2002) Nature Biotechnol. 20:135-141; Pabo et al. (2001)Ann. Rev. Biochem. 70:313-340; Isalan et al. (2001) Nature Biotechnol.19:656-660; Segal et al. (2001) Curr. Opin. Biotechnol. 12:632-637; Chooet al. (2000) Curr. Opin. Struct. Biol. 10:411-416; U.S. Pat. Nos.6,453,242; 6,534,261; 6,599,692; 6,503,717; 6,689,558; 7,030,215;6,794,136; 7,067,317; 7,262,054; 7,070,934; 7,361,635; 7,253,273; andU.S. Patent Publication Nos. 2005/0064474; 2007/0218528; 2005/0267061,all incorporated herein by reference in their entireties.

An engineered zinc finger binding domain can have a novel bindingspecificity, compared to a naturally-occurring zinc finger protein.Engineering methods include, but are not limited to, rational design andvarious types of selection. Rational design includes, for example, usingdatabases comprising triplet (or quadruplet) nucleotide sequences andindividual zinc finger amino acid sequences, in which each triplet orquadruplet nucleotide sequence is associated with one or more amino acidsequences of zinc fingers which bind the particular triplet orquadruplet sequence. See, for example, co-owned U.S. Pat. Nos. 6,453,242and 6,534,261, incorporated by reference herein in their entireties.

Exemplary selection methods, including phage display and two-hybridsystems, are disclosed in U.S. Pat. Nos. 5,789,538; 5,925,523;6,007,988; 6,013,453; 6,410,248; 6,140,466; 6,200,759; and 6,242,568; aswell as WO 98/37186; WO 98/53057; WO 00/27878; WO 01/88197 and GB2,338,237. In addition, enhancement of binding specificity for zincfinger binding domains has been described, for example, in co-owned WO02/077227.

In addition, as disclosed in these and other references, zinc fingerdomains and/or multi-fingered zinc finger proteins may be linkedtogether using any suitable linker sequences, including for example,linkers of 5 or more amino acids in length. See, also, U.S. Pat. Nos.6,479,626; 6,903,185; and 7,153,949 for exemplary linker sequences 6 ormore amino acids in length. The proteins described herein may includeany combination of suitable linkers between the individual zinc fingersof the protein.

Selection of target sites; ZFPs and methods for design and constructionof fusion proteins (and polynucleotides encoding same) are known tothose of skill in the art and described in detail in U.S. Pat. Nos.6,140,081; 5,789,538; 6,453,242; 6,534,261; 5,925,523; 6,007,988;6,013,453; 6,200,759; WO 95/19431; WO 96/06166; WO 98/53057; WO98/54311; WO 00/27878; WO 01/60970 WO 01/88197; WO 02/099084; WO98/53058; WO 98/53059; WO 98/53060; WO 02/016536 and WO 03/016496.

Nearly any linker (spacer) may be used between one or more of thecomponents of the DNA-binding domain (e.g., zinc fingers), between oneor more DNA-binding domains and/or between the DNA-binding domain andthe functional domain (e.g., nuclease). Non-limiting examples ofsuitable linker sequences include U.S. Pat. Nos. 8,772,453; 7,888,121;6,479,626; 6,903,185; and 7,153,949; U.S. Publication Nos. 20090305419and 20150064789. Thus, the proteins described herein may include anycombination of suitable linkers between the individual DNA-bindingcomponents and/or between the DNA-binding domain and the functionaldomain of the compositions described herein.

The CRISPR (clustered regularly interspaced short palindromic repeats)locus, which encodes RNA components of the system, and the cas(CRISPR-associated) locus, which encodes proteins (Jansen et al., 2002.Mol. Microbiol. 43: 1565-1575; Makarova et al., 2002. Nucleic Acids Res.30: 482-496; Makarova et al., 2006. Biol. Direct 1: 7; Haft et al.,2005. PLoS Comput. Biol. 1: e60) make up the gene sequences of theCRISPR/Cas nuclease system. CRISPR loci in microbial hosts contain acombination of CRISPR-associated (Cas) genes as well as non-coding RNAelements capable of programming the specificity of the CRISPR-mediatednucleic acid cleavage.

The Type II CRISPR is one of the most well characterized systems andcarries out targeted DNA double-strand break in four sequential steps.First, two non-coding RNA, the pre-crRNA array and tracrRNA, aretranscribed from the CRISPR locus. Second, tracrRNA hybridizes to therepeat regions of the pre-crRNA and mediates the processing of pre-crRNAinto mature crRNAs containing individual spacer sequences. Third, themature crRNA:tracrRNA complex directs Cas9 to the target DNA viaWatson-Crick base-pairing between the spacer on the crRNA and theprotospacer on the target DNA next to the protospacer adjacent motif(PAM), an additional requirement for target recognition. Finally, Cas9mediates cleavage of target DNA to create a double-stranded break withinthe protospacer. Activity of the CRISPR/Cas system comprises of threesteps: (i) insertion of alien DNA sequences into the CRISPR array toprevent future attacks, in a process called ‘adaptation’, (ii)expression of the relevant proteins, as well as expression andprocessing of the array, followed by (iii) RNA-mediated interferencewith the alien nucleic acid. Thus, in the bacterial cell, several of theso-called ‘Cas’ proteins are involved with the natural function of theCRISPR/Cas system and serve roles in functions such as insertion of thealien DNA etc.

In certain embodiments, Cas protein may be a “functional derivative” ofa naturally occurring Cas protein. A “functional derivative” of a nativesequence polypeptide is a compound having a qualitative biologicalproperty in common with a native sequence polypeptide. “Functionalderivatives” include, but are not limited to, fragments of a nativesequence and derivatives of a native sequence polypeptide and itsfragments, provided that they have a biological activity in common witha corresponding native sequence polypeptide. A biological activitycontemplated herein is the ability of the functional derivative tohydrolyze a DNA substrate into fragments. The term “derivative”encompasses both amino acid sequence variants of polypeptide, covalentmodifications, and fusions thereof. Suitable derivatives of a Caspolypeptide or a fragment thereof include but are not limited tomutants, fusions, covalent modifications of Cas protein or a fragmentthereof. Cas protein, which includes Cas protein or a fragment thereof,as well as derivatives of Cas protein or a fragment thereof, may beobtainable from a cell or synthesized chemically or by a combination ofthese two procedures. The cell may be a cell that naturally produces Casprotein, or a cell that naturally produces Cas protein and isgenetically engineered to produce the endogenous Cas protein at a higherexpression level or to produce a Cas protein from an exogenouslyintroduced nucleic acid, which nucleic acid encodes a Cas that is sameor different from the endogenous Cas. In some case, the cell does notnaturally produce Cas protein and is genetically engineered to produce aCas protein.

In some embodiments, the DNA binding domain is part of a TtAgo system(see Swarts et al, ibid; Sheng et al, ibid). In eukaryotes, genesilencing is mediated by the Argonaute (Ago) family of proteins. In thisparadigm, Ago is bound to small (19-31 nt) RNAs. This protein-RNAsilencing complex recognizes target RNAs via Watson-Crick base pairingbetween the small RNA and the target and endonucleolytically cleaves thetarget RNA (Vogel (2014) Science 344:972-973). In contrast, prokaryoticAgo proteins bind to small single-stranded DNA fragments and likelyfunction to detect and remove foreign (often viral) DNA (Yuan et al.,(2005) Mol. Cell 19, 405; Olovnikov, et al. (2013) Mol. Cell 51, 594;Swarts et al., ibid). Exemplary prokaryotic Ago proteins include thosefrom Aquifex aeolicus, Rhodobacter sphaeroides, and Thermusthermophilus.

One of the most well-characterized prokaryotic Ago protein is the onefrom T. thermophilus (TtAgo; Swarts et al. ibid). TtAgo associates witheither 15 nt or 13-25 nt single-stranded DNA fragments with 5′ phosphategroups. This “guide DNA” bound by TtAgo serves to direct the protein-DNAcomplex to bind a Watson-Crick complementary DNA sequence in athird-party molecule of DNA. Once the sequence information in theseguide DNAs has allowed identification of the target DNA, the TtAgo-guideDNA complex cleaves the target DNA. Such a mechanism is also supportedby the structure of the TtAgo-guide DNA complex while bound to itstarget DNA (G. Sheng et al., ibid). Ago from Rhodobacter sphaeroides(RsAgo) has similar properties (Olivnikov et al. ibid).

Exogenous guide DNAs of arbitrary DNA sequence can be loaded onto theTtAgo protein (Swarts et al. ibid.). Since the specificity of TtAgocleavage is directed by the guide DNA, a TtAgo-DNA complex formed withan exogenous, investigator-specified guide DNA will therefore directTtAgo target DNA cleavage to a complementary investigator-specifiedtarget DNA. In this way, one may create a targeted double-strand breakin DNA. Use of the TtAgo-guide DNA system (or orthologous Ago-guide DNAsystems from other organisms) allows for targeted cleavage of genomicDNA within cells. Such cleavage can be either single- ordouble-stranded. For cleavage of mammalian genomic DNA, it would bepreferable to use of a version of TtAgo codon optimized for expressionin mammalian cells. Further, it might be preferable to treat cells witha TtAgo-DNA complex formed in vitro where the TtAgo protein is fused toa cell-penetrating peptide. Further, it might be preferable to use aversion of the TtAgo protein that has been altered via mutagenesis tohave improved activity at 37 degrees Celsius. Ago-RNA-mediated DNAcleavage could be used to affect a panopoly of outcomes including geneknock-out, targeted gene addition, gene correction, targeted genedeletion using techniques standard in the art for exploitation of DNAbreaks.

Thus, the nuclease comprises a DNA-binding domain in that specificallybinds to a target site in any gene into which it is desired to insert adonor (transgene).

B. Cleavage Domains

Any suitable cleavage domain can be operatively linked to a DNA-bindingdomain to form a nuclease. For example, ZFP DNA-binding domains havebeen fused to nuclease domains to create ZFNs—a functional entity thatis able to recognize its intended nucleic acid target through itsengineered (ZFP) DNA binding domain and cause the DNA to be cut near theZFP binding site via the nuclease activity. See, e.g., Kim et al. (1996)Proc Nat'l Acad Sci USA 93(3):1156-1160. More recently, ZFNs have beenused for genome modification in a variety of organisms. See, forexample, United States Patent Publications 20030232410; 20050208489;20050026157; 20050064474; 20060188987; 20060063231; and InternationalPublication WO 07/014275. Likewise, TALE DNA-binding domains have beenfused to nuclease domains to create TALENs. See, e.g., U.S. PublicationNo. 20110301073.

As noted above, the cleavage domain may be heterologous to theDNA-binding domain, for example a zinc finger DNA-binding domain and acleavage domain from a nuclease or a TALEN DNA-binding domain and acleavage domain, or meganuclease DNA-binding domain and cleavage domainfrom a different nuclease. Heterologous cleavage domains can be obtainedfrom any endonuclease or exonuclease. Exemplary endonucleases from whicha cleavage domain can be derived include, but are not limited to,restriction endonucleases and homing endonucleases. See, for example,2002-2003 Catalogue, New England Biolabs, Beverly, Mass.; and Belfort etal. (1997) Nucleic Acids Res. 25:3379-3388. Additional enzymes whichcleave DNA are known (e.g., S1 Nuclease; mung bean nuclease; pancreaticDNase I; micrococcal nuclease; yeast HO endonuclease; see also Linn etal. (eds.) Nucleases, Cold Spring Harbor Laboratory Press, 1993). One ormore of these enzymes (or functional fragments thereof) can be used as asource of cleavage domains and cleavage half-domains.

Similarly, a cleavage half-domain can be derived from any nuclease orportion thereof, as set forth above, that requires dimerization forcleavage activity. In general, two fusion proteins are required forcleavage if the fusion proteins comprise cleavage half-domains.Alternatively, a single protein comprising two cleavage half-domains canbe used. The two cleavage half-domains can be derived from the sameendonuclease (or functional fragments thereof), or each cleavagehalf-domain can be derived from a different endonuclease (or functionalfragments thereof). In addition, the target sites for the two fusionproteins are preferably disposed, with respect to each other, such thatbinding of the two fusion proteins to their respective target sitesplaces the cleavage half-domains in a spatial orientation to each otherthat allows the cleavage half-domains to form a functional cleavagedomain, e.g., by dimerizing. Thus, in certain embodiments, the nearedges of the target sites are separated by 5-8 nucleotides or by 15-18nucleotides. However any integral number of nucleotides or nucleotidepairs can intervene between two target sites (e.g., from 2 to 50nucleotide pairs or more). In general, the site of cleavage lies betweenthe target sites.

Restriction endonucleases (restriction enzymes) are present in manyspecies and are capable of sequence-specific binding to DNA (at arecognition site), and cleaving DNA at or near the site of binding.Certain restriction enzymes (e.g., Type IIS) cleave DNA at sites removedfrom the recognition site and have separable binding and cleavagedomains. For example, the Type IIS enzyme Fok I catalyzesdouble-stranded cleavage of DNA, at 9 nucleotides from its recognitionsite on one strand and 13 nucleotides from its recognition site on theother. See, for example, U.S. Pat. Nos. 5,356,802; 5,436,150 and5,487,994; as well as Li et al. (1992) Proc. Natl. Acad. Sci. USA89:4275-4279; Li et al. (1993) Proc. Natl. Acad. Sci. USA 90:2764-2768;Kim et al. (1994a) Proc. Natl. Acad. Sci. USA 91:883-887; Kim et al.(1994b) J. Biol. Chem. 269:31,978-31,982. Thus, in one embodiment,fusion proteins comprise the cleavage domain (or cleavage half-domain)from at least one Type IIS restriction enzyme and one or more zincfinger binding domains, which may or may not be engineered.

An exemplary Type IIS restriction enzyme, whose cleavage domain isseparable from the binding domain, is FokI. This particular enzyme isactive as a dimer. Bitinaite et al. (1998) Proc. Natl. Acad. Sci. USA95: 10,570-10,575. Accordingly, for the purposes of the presentdisclosure, the portion of the Fok I enzyme used in the disclosed fusionproteins is considered a cleavage half-domain. Thus, for targeteddouble-stranded cleavage and/or targeted replacement of cellularsequences using zinc finger-Fok I fusions, two fusion proteins, eachcomprising a FokI cleavage half-domain, can be used to reconstitute acatalytically active cleavage domain. Alternatively, a singlepolypeptide molecule containing a zinc finger binding domain and two FokI cleavage half-domains can also be used. Parameters for targetedcleavage and targeted sequence alteration using zinc finger-Fok Ifusions are provided elsewhere in this disclosure.

A cleavage domain or cleavage half-domain can be any portion of aprotein that retains cleavage activity, or that retains the ability tomultimerize (e.g., dimerize) to form a functional cleavage domain.

Exemplary Type IIS restriction enzymes are described in InternationalPublication WO 07/014275, incorporated herein in its entirety.Additional restriction enzymes also contain separable binding andcleavage domains, and these are contemplated by the present disclosure.See, for example, Roberts et al. (2003) Nucleic Acids Res. 31:418-420.

In certain embodiments, the cleavage domain comprises one or moreengineered cleavage half-domain (also referred to as dimerization domainmutants) that minimize or prevent homodimerization, as described, forexample, in See, e.g., U.S. Pat. Nos. 7,914,796; 8,034,598 and8,623,618, the disclosures of all of which are incorporated by referencein their entireties herein. Amino acid residues at positions 446, 447,479, 483, 484, 486, 487, 490, 491, 496, 498, 499, 500, 531, 534, 537,and 538 of FokI are all targets for influencing dimerization of the FokIcleavage half-domains.

Exemplary engineered cleavage half-domains of FokI that form obligateheterodimers include a pair in which a first cleavage half-domainincludes mutations at amino acid residues at positions 490 and 538 ofFokI and a second cleavage half-domain includes mutations at amino acidresidues 486 and 499.

Thus, in one embodiment, a mutation at 490 replaces Glu (E) with Lys(K); the mutation at 538 replaces Iso (I) with Lys (K); the mutation at486 replaced Gln (Q) with Glu (E); and the mutation at position 499replaces Iso (I) with Lys (K). Specifically, the engineered cleavagehalf-domains described herein were prepared by mutating positions 490(E→K) and 538 (I→K) in one cleavage half-domain to produce an engineeredcleavage half-domain designated “E490K:I538K” and by mutating positions486 (Q→E) and 499 (I→L) in another cleavage half-domain to produce anengineered cleavage half-domain designated “Q486E:I499L”. The engineeredcleavage half-domains described herein are obligate heterodimer mutantsin which aberrant cleavage is minimized or abolished. See, e.g., U.S.Patent Publication No. 2008/0131962, the disclosure of which isincorporated by reference in its entirety for all purposes. In certainembodiments, the engineered cleavage half-domain comprises mutations atpositions 486, 499 and 496 (numbered relative to wild-type FokI), forinstance mutations that replace the wild type Gln (Q) residue atposition 486 with a Glu (E) residue, the wild type Iso (I) residue atposition 499 with a Leu (L) residue and the wild-type Asn (N) residue atposition 496 with an Asp (D) or Glu (E) residue (also referred to as a“ELD” and “ELE” domains, respectively). In other embodiments, theengineered cleavage half-domain comprises mutations at positions 490,538 and 537 (numbered relative to wild-type FokI), for instancemutations that replace the wild type Glu (E) residue at position 490with a Lys (K) residue, the wild type Iso (I) residue at position 538with a Lys (K) residue, and the wild-type His (H) residue at position537 with a Lys (K) residue or a Arg (R) residue (also referred to as“KKK” and “KKR” domains, respectively). In other embodiments, theengineered cleavage half-domain comprises mutations at positions 490 and537 (numbered relative to wild-type FokI), for instance mutations thatreplace the wild type Glu (E) residue at position 490 with a Lys (K)residue and the wild-type His (H) residue at position 537 with a Lys (K)residue or a Arg (R) residue (also referred to as “KIK” and “KIR”domains, respectively. See, e.g., U.S. Pat. Nos. 7,914,796; 8,034,598and 8,623,618. In other embodiments, the engineered cleavage half domaincomprises the “Sharkey” and/or “Sharkey′” mutations (see Guo et al,(2010) J. Mol. Biol. 400(1):96-107).

Engineered cleavage half-domains described herein can be prepared usingany suitable method, for example, by site-directed mutagenesis ofwild-type cleavage half-domains (Fok I) as described in U.S. PatentPublication Nos. 20050064474; 20080131962; and 20110201055.

Alternatively, nucleases may be assembled in vivo at the nucleic acidtarget site using so-called “split-enzyme” technology (see, e.g. U.S.Patent Publication No. 20090068164). Components of such split enzymesmay be expressed either on separate expression constructs, or can belinked in one open reading frame where the individual components areseparated, for example, by a self-cleaving 2A peptide or IRES sequence.Components may be individual zinc finger binding domains or domains of ameganuclease nucleic acid binding domain.

Nucleases can be screened for activity prior to use, for example in ayeast-based chromosomal system as described in WO 2009/042163 and20090068164. Nuclease expression constructs can be readily designedusing methods known in the art. See, e.g., United States PatentPublications 20030232410; 20050208489; 20050026157; 20050064474;20060188987; 20060063231; and International Publication WO 07/014275.Expression of the nuclease may be under the control of a constitutivepromoter or an inducible promoter, for example the galactokinasepromoter which is activated (de-repressed) in the presence of raffinoseand/or galactose and repressed in presence of glucose.

The Cas9 related CRISPR/Cas system comprises two RNA non-codingcomponents: tracrRNA and a pre-crRNA array containing nuclease guidesequences (spacers) interspaced by identical direct repeats (DRs). Touse a CRISPR/Cas system to accomplish genome engineering, both functionsof these RNAs must be present (see Cong et al, (2013) Sciencexpress1/10.1126/science 1231143). In some embodiments, the tracrRNA andpre-crRNAs are supplied via separate expression constructs or asseparate RNAs. In other embodiments, a chimeric RNA is constructed wherean engineered mature crRNA (conferring target specificity) is fused to atracrRNA (supplying interaction with the Cas9) to create a chimericcr-RNA-tracrRNA hybrid (also termed a single guide RNA). (see Jinek ibidand Cong, ibid).

Target Sites

As described in detail above, DNA domains can be engineered to bind toany sequence of choice. An engineered DNA-binding domain can have anovel binding specificity, compared to a naturally-occurring DNA-bindingdomain. In certain embodiments, the DNA-binding domains bind to asequence within a BCL11A enhancer sequence, for example a target site(typically 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21 or evenmore base pairs) is between exon 2 and exon 3 of BCL11A, includingDNA-binding domains that bind to a sequence within a DNAseIhypersensitive site in the BCL11A enhancer sequence (e.g., +55, +58,+62; see FIG. 11). Engineering methods include, but are not limited to,rational design and various types of selection. Rational designincludes, for example, using databases comprising triplet (orquadruplet) nucleotide sequences and individual zinc finger amino acidsequences, in which each triplet or quadruplet nucleotide sequence isassociated with one or more amino acid sequences of zinc fingers whichbind the particular triplet or quadruplet sequence. See, for example,co-owned U.S. Pat. Nos. 6,453,242 and 6,534,261, incorporated byreference herein in their entireties. Rational design of TAL-effectordomains can also be performed. See, e.g., U.S. Publication No.20110301073.

Exemplary selection methods applicable to DNA-binding domains, includingphage display and two-hybrid systems, are disclosed in U.S. Pat. Nos.5,789,538; 5,925,523; 6,007,988; 6,013,453; 6,410,248; 6,140,466;6,200,759; and 6,242,568; as well as WO 98/37186; WO 98/53057; WO00/27878; WO 01/88197 and GB 2,338,237. In addition, enhancement ofbinding specificity for zinc finger binding domains has been described,for example, in co-owned WO 02/077227.

Selection of target sites; nucleases and methods for design andconstruction of fusion proteins (and polynucleotides encoding same) areknown to those of skill in the art and described in detail in U.S.Patent Application Publication Nos. 20050064474 and 20060188987,incorporated by reference in their entireties herein.

In addition, as disclosed in these and other references, DNA-bindingdomains (e.g., multi-fingered zinc finger proteins) and/or fusions ofDNA-binding domain(s) and functional domain(s) may be linked togetherusing any suitable linker sequences, including for example, linkers of 5or more amino acids. U.S. Pat. Nos. 8,772,453; 7,888,121 (e.g., “ZC”linker); U.S. Pat. Nos. 6,479,626; 6,903,185; and 7,153,949; U.S.Publication Nos. 20090305419 and 20150064789. The proteins describedherein may include any combination of suitable linkers between theindividual DNA-binding domains of the protein. See, also, U.S. Pat. No.8,586,526.

Donors

In certain embodiments, the present disclosure relates tonuclease-mediated targeted integration of an exogenous sequence into thegenome of a cell using the BCL11A enhancer region-binding moleculesdescribed herein. As noted above, insertion of an exogenous sequence(also called a “donor sequence” or “donor” or “transgene”), for examplefor deletion of a specified region and/or correction of a mutant gene orfor increased expression of a wild-type gene. It will be readilyapparent that the donor sequence is typically not identical to thegenomic sequence where it is placed. A donor sequence can contain anon-homologous sequence flanked by two regions of homology to allow forefficient HDR at the location of interest or can be integrated vianon-homology directed repair mechanisms. Additionally, donor sequencescan comprise a vector molecule containing sequences that are nothomologous to the region of interest in cellular chromatin. A donormolecule can contain several, discontinuous regions of homology tocellular chromatin, and, for example, lead to a deletion of a Bcl11aenhancer region (or a fragment thereof) when used as a substrate forrepair of a DBS induced by one of the nucleases described here. Further,for targeted insertion of sequences not normally present in a region ofinterest, said sequences can be present in a donor nucleic acid moleculeand flanked by regions of homology to sequence in the region ofinterest.

Polynucleotides for insertion can also be referred to as “exogenous”polynucleotides, “donor” polynucleotides or molecules or “transgenes.”The donor polynucleotide can be DNA or RNA, single-stranded and/ordouble-stranded and can be introduced into a cell in linear or circularform. See, e.g., U.S. Patent Publication Nos. 20100047805 and20110207221. The donor sequence(s) are preferably contained within a DNAMC, which may be introduced into the cell in circular or linear form. Ifintroduced in linear form, the ends of the donor sequence can beprotected (e.g., from exonucleolytic degradation) by methods known tothose of skill in the art. For example, one or more dideoxynucleotideresidues are added to the 3′ terminus of a linear molecule and/orself-complementary oligonucleotides are ligated to one or both ends.See, for example, Chang et al. (1987) Proc. Natl. Acad. Sci.USA84:4959-4963; Nehls et al. (1996) Science 272:886-889. Additionalmethods for protecting exogenous polynucleotides from degradationinclude, but are not limited to, addition of terminal amino group(s) andthe use of modified internucleotide linkages such as, for example,phosphorothioates, phosphoramidates, and O-methyl ribose or deoxyriboseresidues. If introduced in double-stranded form, the donor may includeone or more nuclease target sites, for example, nuclease target sitesflanking the transgene to be integrated into the cell's genome. See,e.g., U.S. Patent Publication No. 20130326645.

A polynucleotide can be introduced into a cell as part of a vectormolecule having additional sequences such as, for example, replicationorigins, promoters and genes encoding antibiotic resistance. Moreover,donor polynucleotides can be introduced as naked nucleic acid, asnucleic acid complexed with an agent such as a liposome or poloxamer, orcan be delivered by viruses (e.g., adenovirus, AAV, herpesvirus,retrovirus, lentivirus and integrase defective lentivirus (IDLV)).

In certain embodiments, the double-stranded donor includes sequences(e.g., coding sequences, also referred to as transgenes) greater than 1kb in length, for example between 2 and 200 kb, between 2 and 10 kb (orany value therebetween). The double-stranded donor also includes atleast one nuclease target site, for example. In certain embodiments, thedonor includes at least 2 target sites, for example for a pair of ZFNsor TALENs. Typically, the nuclease target sites are outside thetransgene sequences, for example, 5′ and/or 3′ to the transgenesequences, for cleavage of the transgene. The nuclease cleavage site(s)may be for any nuclease(s). In certain embodiments, the nuclease targetsite(s) contained in the double-stranded donor are for the samenuclease(s) used to cleave the endogenous target into which the cleaveddonor is integrated via homology-independent methods.

The donor is generally inserted so that its expression is driven by theendogenous promoter at the integration site, namely the promoter thatdrives expression of the endogenous gene into which the donor isinserted (e.g., globin, AAVS1, etc.). However, it will be apparent thatthe donor may comprise a promoter and/or enhancer, for example aconstitutive promoter or an inducible or tissue specific promoter.

The donor molecule may be inserted into an endogenous gene such thatall, some or none of the endogenous gene is expressed. In otherembodiments, the transgene (e.g., with or without globin encodingsequences) is integrated into any endogenous locus, for example asafe-harbor locus. See, e.g., US patent publications 20080299580;20080159996 and 201000218264.

Furthermore, although not required for expression, exogenous sequencesmay also include transcriptional or translational regulatory sequences,for example, promoters, enhancers, insulators, internal ribosome entrysites, sequences encoding 2A peptides and/or polyadenylation signals.

The transgenes carried on the donor sequences described herein may beisolated from plasmids, cells or other sources using standard techniquesknown in the art such as PCR. Donors for use can include varying typesof topology, including circular supercoiled, circular relaxed, linearand the like. Alternatively, they may be chemically synthesized usingstandard oligonucleotide synthesis techniques. In addition, donors maybe methylated or lack methylation. Donors may be in the form ofbacterial or yeast artificial chromosomes (BACs or YACs).

The double-stranded donor polynucleotides described herein may includeone or more non-natural bases and/or backbones. In particular, insertionof a donor molecule with methylated cytosines may be carried out usingthe methods described herein to achieve a state of transcriptionalquiescence in a region of interest.

The exogenous (donor) polynucleotide may comprise any sequence ofinterest (exogenous sequence). Exemplary exogenous sequences include,but are not limited to any polypeptide coding sequence (e.g., cDNAs),promoter sequences, enhancer sequences, epitope tags, marker genes,cleavage enzyme recognition sites and various types of expressionconstructs. Marker genes include, but are not limited to, sequencesencoding proteins that mediate antibiotic resistance (e.g., ampicillinresistance, neomycin resistance, G418 resistance, puromycin resistance),sequences encoding colored or fluorescent or luminescent proteins (e.g.,green fluorescent protein, enhanced green fluorescent protein, redfluorescent protein, luciferase), and proteins which mediate enhancedcell growth and/or gene amplification (e.g., dihydrofolate reductase).Epitope tags include, for example, one or more copies of FLAG, His, myc,Tap, HA or any detectable amino acid sequence.

In a preferred embodiment, the exogenous sequence (transgene) comprisesa polynucleotide encoding any polypeptide of which expression in thecell is desired, including, but not limited to antibodies, antigens,enzymes, receptors (cell surface or nuclear), hormones, lymphokines,cytokines, reporter polypeptides, growth factors, and functionalfragments of any of the above. The coding sequences may be, for example,cDNAs.

For example, the exogenous sequence may comprise a sequence encoding apolypeptide that is lacking or non-functional in the subject having agenetic disease, including but not limited to any of the followinggenetic diseases: achondroplasia, achromatopsia, acid maltasedeficiency, adenosine deaminase deficiency (OMIM No. 102700),adrenoleukodystrophy, aicardi syndrome, alpha-1 antitrypsin deficiency,alpha-thalassemia, androgen insensitivity syndrome, apert syndrome,arrhythmogenic right ventricular, dysplasia, ataxia telangictasia, barthsyndrome, beta-thalassemia, blue rubber bleb nevus syndrome, canavandisease, chronic granulomatous diseases (CGD), cri du chat syndrome,cystic fibrosis, dercum's disease, ectodermal dysplasia, fanconi anemia,fibrodysplasiaossificans progressive, fragile X syndrome, galactosemis,Gaucher's disease, generalized gangliosidoses (e.g., GM1),hemochromatosis, the hemoglobin C mutation in the 6^(th) codon ofbeta-globin (HbC), hemophilia, Huntington's disease, Hurler Syndrome,hypophosphatasia, Klinefleter syndrome, Krabbes Disease, Langer-GiedionSyndrome, leukocyte adhesion deficiency (LAD, OMIM No. 116920),leukodystrophy, long QT syndrome, Marfan syndrome, Moebius syndrome,mucopolysaccharidosis (MPS), nail patella syndrome, nephrogenic diabetesinsipdius, neurofibromatosis, Neimann-Pick disease, osteogenesisimperfecta, porphyria, Prader-Willi syndrome, progeria, Proteussyndrome, retinoblastoma, Rett syndrome, Rubinstein-Taybi syndrome,Sanfilippo syndrome, severe combined immunodeficiency (SCID), Shwachmansyndrome, sickle cell disease (sickle cell anemia), Smith-Magenissyndrome, Stickler syndrome, Tay-Sachs disease, Thrombocytopenia AbsentRadius (TAR) syndrome, Treacher Collins syndrome, trisomy, tuberoussclerosis, Turner's syndrome, urea cycle disorder, von Hippel-Landaudisease, Waardenburg syndrome, Williams syndrome, Wilson's disease,Wiskott-Aldrich syndrome, X-linked lymphoproliferative syndrome (XLP,OMIM No. 308240).

Additional exemplary diseases that can be treated by targetedintegration include acquired immunodeficiencies, lysosomal storagediseases (e.g., Gaucher's disease, GM1, Fabry disease and Tay-Sachsdisease), mucopolysaccahidosis (e.g. Hunter's disease, Hurler'sdisease), hemoglobinopathies (e.g., sickle cell diseases, HbC,α-thalassemia, β-thalassemia) and hemophilias.

In certain embodiments, the exogenous sequences can comprise a markergene (described above), allowing selection of cells that have undergonetargeted integration, and a linked sequence encoding an additionalfunctionality. Non-limiting examples of marker genes include GFP, drugselection marker(s) and the like.

Additional gene sequences that can be inserted may include, for example,wild-type genes to replace mutated sequences. For example, a wild-typeFactor IX gene sequence may be inserted into the genome of a stem cellin which the endogenous copy of the gene is mutated. The wild-type copymay be inserted at the endogenous locus, or may alternatively betargeted to a safe harbor locus.

Construction of such expression cassettes, following the teachings ofthe present specification, utilizes methodologies well known in the artof molecular biology (see, for example, Ausubel or Maniatis). Before useof the expression cassette to generate a transgenic animal, theresponsiveness of the expression cassette to the stress-inducerassociated with selected control elements can be tested by introducingthe expression cassette into a suitable cell line (e.g., primary cells,transformed cells, or immortalized cell lines).

Furthermore, although not required for expression, exogenous sequencesmay also transcriptional or translational regulatory sequences, forexample, promoters, enhancers, insulators, internal ribosome entrysites, sequences encoding 2A peptides and/or polyadenylation signals.Further, the control elements of the genes of interest can be operablylinked to reporter genes to create chimeric genes (e.g., reporterexpression cassettes).

Targeted insertion of non-coding nucleic acid sequence may also beachieved. Sequences encoding antisense RNAs, RNAi, shRNAs and micro RNAs(miRNAs) may also be used for targeted insertions.

In additional embodiments, the donor nucleic acid may comprisenon-coding sequences that are specific target sites for additionalnuclease designs. Subsequently, additional nucleases may be expressed incells such that the original donor molecule is cleaved and modified byinsertion of another donor molecule of interest. In this way,reiterative integrations of donor molecules may be generated allowingfor trait stacking at a particular locus of interest or at a safe harborlocus.

Delivery

The nucleases, polynucleotides encoding these nucleases, donorpolynucleotides and compositions comprising the proteins and/orpolynucleotides described herein may be delivered in vivo or ex vivo byany suitable means into any cell type.

Suitable cells include eukaryotic (e.g., animal) and prokaryotic cellsand/or cell lines. Non-limiting examples of such cells or cell linesgenerated from such cells include COS, CHO (e.g., CHO-S, CHO-K1,CHO-DG44, CHO-DUXB11, CHO-DUKX, CHOK1SV), VERO, MDCK, WI38, V79,B14AF28-G3, BHK, HaK, NSO, SP2/0-Ag14, HeLa, HEK293 (e.g., HEK293-F,HEK293-H, HEK293-T), and perC6 cells as well as insect cells such asSpodoptera fugiperda (Sf), or fungal cells such as Saccharomyces, Pichiaand Schizosaccharomyces. In certain embodiments, the cell line is a CHO,MDCK or HEK293 cell line. Suitable cells also include stem cells suchas, by way of example, embryonic stem cells, induced pluripotent stemcells, hematopoietic stem cells, neuronal stem cells and mesenchymalstem cells.

Methods of delivering nucleases as described herein are described, forexample, in U.S. Pat. Nos. 6,453,242; 6,503,717; 6,534,261; 6,599,692;6,607,882; 6,689,558; 6,824,978; 6,933,113; 6,979,539; 7,013,219; and7,163,824, the disclosures of all of which are incorporated by referenceherein in their entireties.

Nucleases and/or donor constructs as described herein may also bedelivered using vectors containing sequences encoding one or more of theZFN(s), TALEN(s) or CRIPSR/Cas systems. Any vector systems may be usedincluding, but not limited to, plasmid vectors, retroviral vectors,lentiviral vectors, adenovirus vectors, poxvirus vectors; herpesvirusvectors and adeno-associated virus vectors, etc. See, also, U.S. Pat.Nos. 6,534,261; 6,607,882; 6,824,978; 6,933,113; 6,979,539; 7,013,219;and 7,163,824, incorporated by reference herein in their entireties.Furthermore, it will be apparent that any of these vectors may compriseone or more of the sequences needed for treatment. Thus, when one ormore nucleases and a donor construct are introduced into the cell, thenucleases and/or donor polynucleotide may be carried on the same vectoror on different vectors (DNA MC(s)). When multiple vectors are used,each vector may comprise a sequence encoding one or multiple nucleasesand/or donor constructs. Conventional viral and non-viral based genetransfer methods can be used to introduce nucleic acids encodingnucleases and/or donor constructs in cells (e.g., mammalian cells) andtarget tissues. Non-viral vector delivery systems include DNA or RNAplasmids, DNA MCs, naked nucleic acid, and nucleic acid complexed with adelivery vehicle such as a liposome or poloxamer. Suitable non-viralvectors include nanotaxis vectors, including vectors commerciallyavailable from InCellArt (France). Viral vector delivery systems includeDNA and RNA viruses, which have either episomal or integrated genomesafter delivery to the cell. For a review of in vivo delivery ofengineered DNA-binding proteins and fusion proteins comprising thesebinding proteins, see, e.g., Rebar (2004) Expert Opinion Invest. Drugs13(7):829-839; Rossi et al. (2007) Nature Biotech. 25(12):1444-1454 aswell as general gene delivery references such as Anderson, Science256:808-813 (1992); Nabel & Felgner, TIBTECH 11:211-217 (1993); Mitani &Caskey, TIBTECH 11:162-166 (1993); Dillon, TIBTECH 11:167-175 (1993);Miller, Nature 357:455-460 (1992); Van Brunt, Biotechnology6(10):1149-1154 (1988); Vigne, Restorative Neurology and Neuroscience8:35-36 (1995); Kremer & Perricaudet, British Medical Bulletin51(1):31-44 (1995); Haddada et al., in Current Topics in Microbiologyand Immunology Doerfler and Bohm (eds.) (1995); and Yu et al., GeneTherapy 1:13-26 (1994).

Methods of non-viral delivery of nucleic acids include electroporation,lipofection, microinjection, biolistics, virosomes, liposomes,immunoliposomes, polycation or lipid:nucleic acid conjugates, naked DNA,artificial virions, and agent-enhanced uptake of DNA. Sonoporationusing, e.g., the Sonitron 2000 system (Rich-Mar) can also be used fordelivery of nucleic acids.

Additional exemplary nucleic acid delivery systems include thoseprovided by Amaxa Biosystems (Cologne, Germany), Maxcyte, Inc.(Rockville, Md.), BTX Molecular Delivery Systems (Holliston, Mass.) andCopernicus Therapeutics Inc., (see for example U.S. Pat. No. 6,008,336).Lipofection is described in e.g., U.S. Pat. Nos. 5,049,386; 4,946,787;and 4,897,355) and lipofection reagents are sold commercially (e.g.,Transfectam™ and Lipofectin™). Cationic and neutral lipids that aresuitable for efficient receptor-recognition lipofection ofpolynucleotides include those of Felgner, WO 91/17424, WO 91/16024.

The preparation of lipid:nucleic acid complexes, including targetedliposomes such as immunolipid complexes, is well known to one of skillin the art (see, e.g., Crystal, Science 270:404-410 (1995); Blaese etal., Cancer Gene Ther. 2:291-297 (1995); Behr et al., Bioconjugate Chem.5:382-389 (1994); Remy et al., Bioconjugate Chem. 5:647-654 (1994); Gaoet al., Gene Therapy 2:710-722 (1995); Ahmad et al., Cancer Res.52:4817-4820 (1992); U.S. Pat. Nos. 4,186,183, 4,217,344, 4,235,871,4,261,975, 4,485,054, 4,501,728, 4,774,085, 4,837,028, and 4,946,787).

Additional methods of delivery include the use of packaging the nucleicacids to be delivered into EnGeneIC delivery vehicles (EDVs). These EDVsare specifically delivered to target tissues using bispecific antibodieswhere one arm of the antibody has specificity for the target tissue andthe other has specificity for the EDV. The antibody brings the EDVs tothe target cell surface and then the EDV is brought into the cell byendocytosis. Once in the cell, the contents are released (see MacDiarmidet al (2009) Nature Biotechnology 27(7):643).

The use of RNA or DNA viral based systems for the delivery of nucleicacids encoding engineered ZFPs, TALEs and/or CRISPR/Cas systems takeadvantage of highly evolved processes for targeting a virus to specificcells in the body and trafficking the viral payload to the nucleus.Viral vectors can be administered directly to patients (in vivo) or theycan be used to treat cells in vitro and the modified cells areadministered to patients (ex vivo). Conventional viral based systems forthe delivery of ZFPs include, but are not limited to, retroviral,lentivirus, adenoviral, adeno-associated, vaccinia and herpes simplexvirus vectors for gene transfer. Integration in the host genome ispossible with the retrovirus, lentivirus, and adeno-associated virusgene transfer methods, often resulting in long term expression of theinserted transgene. Additionally, high transduction efficiencies havebeen observed in many different cell types and target tissues.

The tropism of a retrovirus can be altered by incorporating foreignenvelope proteins, expanding the potential target population of targetcells. Lentiviral vectors are retroviral vectors that are able totransduce or infect non-dividing cells and typically produce high viraltiters. Selection of a retroviral gene transfer system depends on thetarget tissue. Retroviral vectors are comprised of cis-acting longterminal repeats with packaging capacity for up to 6-10 kb of foreignsequence. The minimum cis-acting LTRs are sufficient for replication andpackaging of the vectors, which are then used to integrate thetherapeutic gene into the target cell to provide permanent transgeneexpression. Widely used retroviral vectors include those based uponmurine leukemia virus (MuLV), gibbon ape leukemia virus (GaLV), SimianImmunodeficiency virus (SW), human immunodeficiency virus (HIV), andcombinations thereof (see, e.g., Buchscher et al., J. Virol.66:2731-2739 (1992); Johann et al., J. Virol. 66:1635-1640 (1992);Sommerfelt et al., Virol. 176:58-59 (1990); Wilson et al., J. Virol.63:2374-2378 (1989); Miller et al., J. Virol. 65:2220-2224 (1991);PCT/US94/05700).

In applications in which transient expression is preferred, adenoviralbased systems can be used. Adenoviral based vectors are capable of veryhigh transduction efficiency in many cell types and do not require celldivision. With such vectors, high titer and high levels of expressionhave been obtained. This vector can be produced in large quantities in arelatively simple system. Adeno-associated virus (“AAV”) vectors arealso used to transduce cells with target nucleic acids, e.g., in the invitro production of nucleic acids and peptides, and for in vivo and exvivo gene therapy procedures (see, e.g., West et al., Virology 160:38-47(1987); U.S. Pat. No. 4,797,368; WO 93/24641; Kotin, Human Gene Therapy5:793-801 (1994); Muzyczka, J. Clin. Invest. 94:1351 (1994).Construction of recombinant AAV vectors are described in a number ofpublications, including U.S. Pat. No. 5,173,414; Tratschin et al., Mol.Cell. Biol. 5:3251-3260 (1985); Tratschin, et al., Mol. Cell. Biol.4:2072-2081 (1984); Hermonat & Muzyczka, PNAS 81:6466-6470 (1984); andSamulski et al., J. Virol. 63:03822-3828 (1989).

At least six viral vector approaches are currently available for genetransfer in clinical trials, which utilize approaches that involvecomplementation of defective vectors by genes inserted into helper celllines to generate the transducing agent.

pLASN and MFG-S are examples of retroviral vectors that have been usedin clinical trials (Dunbar et al., Blood 85:3048-305 (1995); Kohn etal., Nat. Med. 1:1017-102 (1995); Malech et al., PNAS 94:22 12133-12138(1997)). PA317/pLASN was the first therapeutic vector used in a genetherapy trial. (Blaese et al., Science 270:475-480 (1995)). Transductionefficiencies of 50% or greater have been observed for MFG-S packagedvectors. (Ellem et al., Immunol Immunother. 44(1):10-20 (1997); Dranoffet al., Hum. Gene Ther. 1:111-2 (1997).

Recombinant adeno-associated virus vectors (rAAV) are a promisingalternative gene delivery systems based on the defective andnonpathogenic parvovirus adeno-associated type 2 virus. All vectors arederived from a plasmid that retains only the AAV 145 bp invertedterminal repeats flanking the transgene expression cassette. Efficientgene transfer and stable transgene delivery due to integration into thegenomes of the transduced cell are key features for this vector system.(Wagner et al., Lancet 351:9117 1702-3 (1998), Kearns et al., Gene Ther.9:748-55 (1996)). Other AAV serotypes, including AAV1, AAV2, AAV3, AAV4,AAV5, AAV6, AAV7, AAV8, AAV9 and AAVrh.10 and any novel AAV serotype canalso be used in accordance with the present invention.

Replication-deficient recombinant adenoviral vectors (Ad) can beproduced at high titer and readily infect a number of different celltypes. Most adenovirus vectors are engineered such that a transgenereplaces the Ad E1a, E1b, and/or E3 genes; subsequently the replicationdefective vector is propagated in human 293 cells that supply deletedgene function in trans. Ad vectors can transduce multiple types oftissues in vivo, including nondividing, differentiated cells such asthose found in liver, kidney and muscle. Conventional Ad vectors have alarge carrying capacity. An example of the use of an Ad vector in aclinical trial involved polynucleotide therapy for antitumorimmunization with intramuscular injection (Sterman et al., Hum. GeneTher. 7:1083-9 (1998)). Additional examples of the use of adenovirusvectors for gene transfer in clinical trials include Rosenecker et al.,Infection 24:1 5-10 (1996); Sterman et al., Hum. Gene Ther. 9:71083-1089 (1998); Welsh et al., Hum. Gene Ther. 2:205-18 (1995); Alvarezet al., Hum. Gene Ther. 5:597-613 (1997); Topf et al., Gene Ther.5:507-513 (1998); Sterman et al., Hum. Gene Ther. 7:1083-1089 (1998).

Packaging cells are used to form virus particles that are capable ofinfecting a host cell. Such cells include 293 cells, which packageadenovirus, and w2 cells or PA317 cells, which package retrovirus. Viralvectors used in gene therapy are usually generated by a producer cellline that packages a nucleic acid vector into a viral particle. Thevectors typically contain the minimal viral sequences required forpackaging and subsequent integration into a host (if applicable), otherviral sequences being replaced by an expression cassette encoding theprotein to be expressed. The missing viral functions are supplied intrans by the packaging cell line. For example, AAV vectors used in genetherapy typically only possess inverted terminal repeat (ITR) sequencesfrom the AAV genome which are required for packaging and integrationinto the host genome. Viral DNA is packaged in a cell line, whichcontains a helper plasmid encoding the other AAV genes, namely rep andcap, but lacking ITR sequences. The cell line is also infected withadenovirus as a helper. The helper virus promotes replication of the AAVvector and expression of AAV genes from the helper plasmid. The helperplasmid is not packaged in significant amounts due to a lack of ITRsequences. Contamination with adenovirus can be reduced by, e.g., heattreatment to which adenovirus is more sensitive than AAV.

In many gene therapy applications, it is desirable that the gene therapyvector be delivered with a high degree of specificity to a particulartissue type. Accordingly, a viral vector can be modified to havespecificity for a given cell type by expressing a ligand as a fusionprotein with a viral coat protein on the outer surface of the virus. Theligand is chosen to have affinity for a receptor known to be present onthe cell type of interest. For example, Han et al., Proc. Natl. Acad.Sci. USA 92:9747-9751 (1995), reported that Moloney murine leukemiavirus can be modified to express human heregulin fused to gp70, and therecombinant virus infects certain human breast cancer cells expressinghuman epidermal growth factor receptor. This principle can be extendedto other virus-target cell pairs, in which the target cell expresses areceptor and the virus expresses a fusion protein comprising a ligandfor the cell-surface receptor. For example, filamentous phage can beengineered to display antibody fragments (e.g., FAB or Fv) havingspecific binding affinity for virtually any chosen cellular receptor.Although the above description applies primarily to viral vectors, thesame principles can be applied to nonviral vectors. Such vectors can beengineered to contain specific uptake sequences which favor uptake byspecific target cells.

Gene therapy vectors can be delivered in vivo by administration to anindividual patient, typically by systemic administration (e.g.,intravenous, intraperitoneal, intramuscular, subdermal, or intracranialinfusion) or topical application, as described below. Alternatively,vectors can be delivered to cells ex vivo, such as cells explanted froman individual patient (e.g., lymphocytes, bone marrow aspirates, tissuebiopsy) or universal donor hematopoietic stem cells, followed byreimplantation of the cells into a patient, usually after selection forcells which have incorporated the vector.

Vectors (e.g., retroviruses, adenoviruses, liposomes, etc.) containingnucleases and/or donor constructs can also be administered directly toan organism for transduction of cells in vivo. Alternatively, naked DNAcan be administered. Administration is by any of the routes normallyused for introducing a molecule into ultimate contact with blood ortissue cells including, but not limited to, injection, infusion, topicalapplication and electroporation. Suitable methods of administering suchnucleic acids are available and well known to those of skill in the art,and, although more than one route can be used to administer a particularcomposition, a particular route can often provide a more immediate andmore effective reaction than another route.

Vectors suitable for introduction of polynucleotides (e.g.nuclease-encoding and/or double-stranded donors) described hereininclude non-integrating lentivirus vectors (IDLV). See, for example, Oryet al. (1996) Proc. Natl. Acad. Sci. USA 93:11382-11388; Dull et al.(1998) J. Virol. 72:8463-8471; Zuffery et al. (1998) J. Virol.72:9873-9880; Follenzi et al. (2000) Nature Genetics 25:217-222; U.S.Patent Publication No 2009/0117617.

Pharmaceutically acceptable carriers are determined in part by theparticular composition being administered, as well as by the particularmethod used to administer the composition. Accordingly, there is a widevariety of suitable formulations of pharmaceutical compositionsavailable, as described below (see, e.g., Remington's PharmaceuticalSciences, 17th ed., 1989).

It will be apparent that the nuclease-encoding sequences and donorconstructs can be delivered using the same or different systems. Forexample, the nucleases and donors can be carried by the same DNA MC.Alternatively, a donor polynucleotide can be carried by a MC, while theone or more nucleases can be carried by a standard plasmid or AAVvector. Furthermore, the different vectors can be administered by thesame or different routes (intramuscular injection, tail vein injection,other intravenous injection, intraperitoneal administration and/orintramuscular injection. The vectors can be delivered simultaneously orin any sequential order.

Thus, the instant disclosure includes in vivo or ex vivo treatment ofdiseases and conditions that are amenable to insertion of a transgenesencoding a therapeutic protein. The compositions are administered to ahuman patient in an amount effective to obtain the desired concentrationof the therapeutic polypeptide in the serum or the target organ orcells. Administration can be by any means in which the polynucleotidesare delivered to the desired target cells. For example, both in vivo andex vivo methods are contemplated. Intravenous injection to the portalvein is a preferred method of administration. Other in vivoadministration modes include, for example, direct injection into thelobes of the liver or the biliary duct and intravenous injection distalto the liver, including through the hepatic artery, direct injection into the liver parenchyma, injection via the hepatic artery, and/orretrograde injection through the biliary tree. Ex vivo modes ofadministration include transduction in vitro of resected hepatocytes orother cells of the liver, followed by infusion of the transduced,resected hepatocytes back into the portal vasculature, liver parenchymaor biliary tree of the human patient, see e.g., Grossman et al., (1994)Nature Genetics, 6:335-341.

The effective amount of nuclease(s) and donor to be administered willvary from patient to patient and according to the therapeuticpolypeptide of interest. Accordingly, effective amounts are bestdetermined by the physician administering the compositions andappropriate dosages can be determined readily by one of ordinary skillin the art. After allowing sufficient time for integration andexpression (typically 4-15 days, for example), analysis of the serum orother tissue levels of the therapeutic polypeptide and comparison to theinitial level prior to administration will determine whether the amountbeing administered is too low, within the right range or too high.Suitable regimes for initial and subsequent administrations are alsovariable, but are typified by an initial administration followed bysubsequent administrations if necessary. Subsequent administrations maybe administered at variable intervals, ranging from daily to annually toevery several years. One of skill in the art will appreciate thatappropriate immunosuppressive techniques may be recommended to avoidinhibition or blockage of transduction by immunosuppression of thedelivery vectors, see e.g., Vilquin et al., (1995) Human Gene Ther.,6:1391-1401.

Formulations for both ex vivo and in vivo administrations includesuspensions in liquid or emulsified liquids. The active ingredientsoften are mixed with excipients which are pharmaceutically acceptableand compatible with the active ingredient. Suitable excipients include,for example, water, saline, dextrose, glycerol, ethanol or the like, andcombinations thereof. In addition, the composition may contain minoramounts of auxiliary substances, such as, wetting or emulsifying agents,pH buffering agents, stabilizing agents or other reagents that enhancethe effectiveness of the pharmaceutical composition.

Cells

Also described herein are cells and/or cell lines in which an endogenousBCL11A enhancer sequence is modified. The modification may be, forexample, as compared to the wild-type sequence of the cell. The cell orcell lines may be heterozygous or homozygous for the modification. Themodifications to the BCL11A sequence may comprise insertions, deletionsand/or combinations thereof.

The BCL11A enhancer sequence may be modified by a nuclease (e.g., ZFN,TALEN, CRISPR/Cas system, Ttago system, etc.), for example a nuclease asdescribed herein. In certain embodiments, the BCL11A enhancer ismodified anywhere between exon 2 and exon 3. In other embodiments, theBCL11A enhancer is modified in the regions shown in SEQ ID NO:1, SEQ IDNO:2 or SEQ ID NO:3 (FIG. 11). The modification is preferably at or nearthe nuclease(s) binding and/or cleavage site(s), for example, within1-300 (or any value therebetween) base pairs upstream or downstream ofthe site(s) of cleavage, more preferably within 1-100 base pairs (or anyvalue therebetween) of either side of the binding and/or cleavagesite(s), even more preferably within 1 to 50 base pairs (or any valuetherebetween) on either side of the binding and/or cleavage site(s). Incertain embodiments, the modification is at or near the “+58” region ofthe BCL11A enhancer, for example, at or near a nuclease binding siteshown in any of SEQ ID NOs:4 to 80 and 276. In other embodiments, themodification is at or near the “+55” region of the BCL11A enhancer, forexample, at or near a nuclease site shown in any of SEQ ID NOs:143 to184 and 232-251.

Any cell or cell line may be modified, for example a stem cell, forexample an embryonic stem cell, an induced pluripotent stem cell, ahematopoietic stem cell, a neuronal stem cell and a mesenchymal stemcell. Other non-limiting examples of cells as described herein includeT-cells (e.g., CD4+, CD3+, CD8+, etc.); dendritic cells; B-cells. Adescendent of a stem cell, including a partially or fully differentiatedcell, is also provided (e.g., a RBC or RBC precursor cell). Non-limitingexamples other cell lines including a modified BCL11A sequence includeCOS, CHO (e.g., CHO—S, CHO-K1, CHO-DG44, CHO-DUXB11, CHO-DUKX, CHOK1SV),VERO, MDCK, WI38, V79, B14AF28-G3, BHK, HaK, NSO, SP2/0-Ag14, HeLa,HEK293 (e.g., HEK293-F, HEK293-H, HEK293-T), and perC6 cells as well asinsect cells such as Spodoptera fugiperda (Sf), or fungal cells such asSaccharomyces, Pichia and Schizosaccharomyces.

The cells as described herein are useful in treating and/or preventing adisorder, for example, by ex vivo therapies. The nuclease-modified cellscan be expanded and then reintroduced into the patient using standardtechniques. See, e.g., Tebas et al (2014) New Eng J Med 370(10):901. Inthe case of stem cells, after infusion into the subject, in vivodifferentiation of these precursors into cells expressing the functionaltransgene also occurs. Pharmaceutical compositions comprising the cellsas described herein are also provided. In addition, the cells may becryopreserved prior to administration to a patient.

Any of the modified cells or cell lines disclosed herein may showincreased expression of gamma globin. Compositions such aspharmaceutical compositions comprising the genetically modified cells asdescribed herein are also provided

Applications

The methods and compositions disclosed herein are for modifyingexpression of protein, or correcting an aberrant gene sequence thatencodes a protein expressed in a genetic disease, such as a sickle celldisease or a thalassemia. Thus, the methods and compositions provide forthe treatment and/or prevention of such genetic diseases. Genomeediting, for example of stem cells, can be used to correct an aberrantgene, insert a wild type gene, or change the expression of an endogenousgene. By way of non-limiting example, a wild type gene, e.g. encoding atleast one globin (e.g., α and/or β globin), may be inserted into a cell(e.g., into an endogenous BCL11a enhancer sequence using one or morenucleases as described herein) to provide the globin proteins deficientand/or lacking in the cell and thereby treat a genetic disease, e.g., ahemoglobinopathy, caused by faulty globin expression. Alternatively orin addition, genomic editing with or without administration of theappropriate donor, can correct the faulty endogenous gene, e.g.,correcting the point mutation in α- or β-hemoglobin, to restoreexpression of the gene and/or treat a genetic disease, e.g. sickle celldisease and/or knock out or alteration (overexpression or repression) ofany direct or indirect globin regulatory gene (e.g. inactivation of theγ globin-regulating gene BCL11A or the BCL11A-regulator KLF1).Specifically, the methods and compositions of the invention have use inthe treatment or prevention of hemoglobinopathies.

The nucleases of the invention are targeted to the BCL11A enhancerregion, known to be required for the expression of BCL11A, and hence thedown regulation of gamma globin expression. Modification of thisenhancer region may result in erythrocytes with increased gamma globinexpression, and thus may be helpful for the treatment or prevention ofsickle cell disease or beta thalassemia.

The following Examples relate to exemplary embodiments of the presentdisclosure in which the nuclease comprises a zinc finger nuclease (ZFN)or TALEN. It will be appreciated that this is for purposes ofexemplification only and that other nucleases can be used, for exampleTtAgo and CRISPR/Cas systems, homing endonucleases (meganucleases) withengineered DNA-binding domains and/or fusions of naturally occurring ofengineered homing endonucleases (meganucleases) DNA-binding domains andheterologous cleavage domains and/or fusions of meganucleases and TALEproteins.

EXAMPLES Example 1: Assembly of Zinc Finger Nucleases and TALENNucleases

ZFNs were assembled against the human BCL11A gene and were tested byCEL1 assays as described in Miller et al. (2007) Nat. Biotechnol.25:778-785. TALENs were assembled as described in Miller et al (2011)Nature Biotechnology 29 (2): 143-151. Additionally, see co-owned U.S.Patent Publication No. 20140093913 and U.S. Pat. No. 8,586,526. TheTALENs were assembled with the +63 architecture.

Example 2: Introduction of Deletions in the +55, +58 and +62 BCL11AEnhancer Regions

To test which regions of the BCL11A intron 2 (FIG. 1) enhancer regionwere required for repression of gamma globin in during erythropoiesis, aseries of TALENs were made to target sections of these regions (FIG. 2).The TALEN pairs are shown below in Table 1. Nucleotides in the targetsite that are contacted by the nuclease are indicated in uppercaseletters; non-contacted nucleotides indicated in lowercase.

TABLE 1 TALENs targeted to the BCL11A enhancer region # of SEQ ID SampleSBS# target 5′→3′ RVDs NO: N→C RVD Sequence 55R 102740ctACATAGAGGCCCTTCCTgc 17 276 NI-HD-NI-NG-NI-NN-NI-NN-NN-HD-HD-HD-NG-NG-HD-HD-NG 102741 gtGGAGGGGATAACTGGGTca 17   4NN-NN-NI-NN-NN-NN-NN-NI-NG- NI-NI-HD-NG-NN-NN-NN-NG 55M 102736ttGTGTGCTTGGTCGGCACtg 17   5 NN-NG-NN-NG-NN-HD-NG-NG-NN-NN-NG-HD-NN-NN-HD-NI-HD 102737 gtGCCGACAACTCCCTACCgc 17   6NN-HD-HD-NN-NI-HD-NI-NI-HD- NG-HD-HD-HD-NG-NI-HD-HD 58R 102756gtGCCGACAACTCCCTACCgc 17   6 HD-NG-NG-NN-NN-NG-NN-NI-NG-NN-NN-NI-NN-NI-NI-NG-NG 102757 atTATTTCATTCCCATTGAga 17   7NG-NI-NG-NG-NG-HD-NI-NG-NG- HD-HD-HD-NI-NG-NG-NN-NI 58M 102752atAGGCCAGAAAAGAGATAtg 17   8 NI-NN-NN-HD-HD-NI-NN-NI-NI-NI-NI-NN-NI-NN-NI-NG-NI 102753 ctGGTGTGTTATGTCTAAGag 17   9NN-NN-NG-NN-NG-NN-NG-NG-NI- NG-NN-NG-HD-NG-NI-NI-NK 58L 102750ctAGTTTATAGGGGGTTCTac 17  10 NI-NN-NG-NG-NG-NI-NG-NI-NN-NN-NN-NN-NN-NG-NG-HD-NG 102751 atAGCACCCAAGGTCCATCag 17  11NI-NN-HD-NI-HD-HD-HD-NI-NI-NN- NN-NG-HD-HD-NI-NG-HD 62R 102775atTCAACAAATAGCATATAaa 17  12 NG-HD-NI-NI-HD-NI-NI-NI-NG-NI-NN-HD-NI-NG-NI-NG-NI 102774 ctTCCCTTTTAGGAAGGTAaa 17  13NG-HD-HD-HD-NG-NG-NG-NG-NI- NN-NN-NI-NI-NN-NN-NG-NI 62L 102795atGCCAGAGGGCAGCAAACat 17  14 NN-HD-HD-NI-NN-NI-NN-NN-NN-HD-NI-NN-HD-NI-NI-NI-HD 102794 ctTAATAGCTGAAGGGGGCca 17  15NG-NI-NI-NG-NI-NN-HD-NG-NN-NI- NI-NN-NN-NN-NN-NN-HD

Human K562 cells were cultured in DMEM supplemented with 10% FBS and200,000 cells were transfected with 800 ng of plasmid DNA encoding theTALENs by Amaxa Nucleofector® following the manufacturer's instructions.The Cel-I assay (Surveyor™, Transgenomics) as described in Perez et al.(2008) Nat. Biotechnol. 26: 808-816 and Guschin et al. (2010) MethodsMol Biol. 649:247-56), was used to detect TALEN-induced modifications ofthe target gene. In this assay, PCR-amplification of the target site wasfollowed by quantification of insertions and/or deletions (indels) usingthe mismatch detecting enzyme Cel-I (Yang et al. (2000) Biochemistry 39:3533-3541) which provided a lower-limit estimate of DSB frequency. Deepsequencing on the Illumina platform (“miSEQ”) was used according to themanufacturer's instructions to measure editing efficiency as well asnature of editing-generated alleles. To detect deletions following celltreatment with more than one nuclease pair, a PCR-based assay was usedin which bulk genomic DNA is amplified with primers that flank theregion to be deleted, and a gel is used to separate the PCR productderived from the wild-type allele and the one derived from thedeletion-bearing allele. All designs shown in Table 1 were active.

Three days following transfection of the TALEN expression vector atstandard conditions (37° C.) genomic DNA was isolated from K562 cellsusing the DNeasy kit (Qiagen) or QuickExtract (Epicentre) and subject toPCR amplification.

The results from the Cel-I assay demonstrated that the TALENs werecapable of inducing cleavage at their respective target sites.

To test the effect on relative gamma globin expression, the mRNAsencoding the TALEN pairs were introduced into CD34+ cells (obtained fromhealthy donor volunteers) by BTX nucleofection according tomanufacturer's instructions. The cells were then differentiated intoerythrocytes. Briefly, CD34+ cells were purified using Ficoll-Paque (GEHealthcare) and CD34⁺ microbeads (MiltenyiBiotec) according to themanufacturers' instructions. CD34⁺ cells were cultured in Iscove's MDMwith BIT 95000 (StemCell Technologies) in the presence of growthfactors. Cells were differentiated toward the erythroid lineage using a3 step liquid culture model. During the first 6 days (first phase),CD34⁺ cells were expanded with SCF (100 ng/ml), Flt3-L (100 ng/ml), andIL-3 (20 ng/ml). Expanded cells were then committed and differentiatedtoward the erythroid lineage (second phase) with Epo (2 U/ml) and SCF(50 ng/ml). See, Giarratana et al. (2011) Blood 118(19):5071-9.

To analyze relative gamma globin expression, the ratios of mRNAsencoding gamma globin and beta globin following TALEN treatment weredetermined at 14 days following TALEN introduction by Taqman® analysis.The analysis was done by standard Taqman® analysis, following theprotocol and using gene specific assays supplied by the manufacturer(Applied Biosystems) and the primer sets supplied. The relative levelsof gamma globin were normalized by the level of alpha or beta globinexpression where the ratio was compared to the alpha/beta or gamma/betaratio in untreated cells. The results (FIGS. 3, 4 and 5) demonstratedthat deletions of regions within the +58 and +55 BCL11A DNAseIhypersensitive site resulted in an increase in the relative levels ofgamma globin expression in these experiments.

Example 3: TALEN “Walk Across” the +58 DNAse I Hypersensitive Site inBCL11A

To further define the area required for enhancer activity in the +58region, a series of TALENs were made to create a series of DSBs anddeletions across this stretch of DNA. The TALENs used in this experimentare shown below in Table 2.

TABLE 2 TALENs used in the +58 enhancer walk # of SEQ ID Sample SBS#target 5′→3′ RVDs NO: N→C RVD Sequence  1 102830 gtGTGCATAAGTAAGAGCAga17 16 NN-NG-NN-HD-NI-NG-NI-NI- NN-NG-NI-NI-NN-NI-NN-HD- NI 102831ctGTATGGACTTTGCACTGga 17 17 NN-NG-NI-NG-NN-NN-NI-HD-NG-NG-NG-NN-HD-NI-HD-NG- NK  2 102832 gtAAGAGCAGATAGCTGATtc 17 18NI-NI-NN-NI-NN-HD-NI-NN- NI-NG-NI-NN-HD-NG-NN-NI- NG 102833atGTTATTACCTGTATGGAct 17 19 NN-NG-NG-NI-NG-NG-NI-HD-HD-NG-NN-NG-NI-NG-NN-NN- NI  3 102834 atAGCTGATTCCAGTGCAAag 17 20NI-NN-HD-NG-NN-NI-NG-NG- HD-HD-NI-NN-NG-NN-HD-NI- NI 102835ttTTCTGGCCTATGTTATTac 17 21 NG-NG-HD-NG-NN-NN-HD-HD-NG-NI-NG-NN-NG-NG-NI- NG-NG  4 102836 gtGCAAAGTCCATACAGGTaa 17 22NN-HD-NI-NI-NI-NN-NG-HD- HD-NI-NG-NI-HD-NI-NN-NN- NG 102837atGCCATATCTCTTTTCTGgc 17 23 NN-HD-HD-NI-NG-NI-NG-HD-NG-HD-NG-NG-NG-NG-HD- NG-NK  5 102838 atACAGGTAATAACATAGGcc 17 24NI-HD-NI-NN-NN-NG-NI-NI- NG-NI-NI-HD-NI-NG-NI-NN-NK 102839ctAAGAGTAGATGCCATATct 17 25 NI-NI-NN-NI-NN-NG-NI-NN-NI-NG-NN-HD-HD-NI-NG-NI- NG  6 102840 atAACATAGGCCAGAAAAGag 17 26NI-NI-HD-NI-NG-NI-NN-NN- HD-HD-NI-NN-NI-NI-NI-NI-NK 102841gtGTTATGTCTAAGAGTAGat 17 27 NN-NG-NG-NI-NG-NN-NG-HD-NG-NI-NI-NN-NI-NN-NG-NI- NK  7 102842 ctCTTAGACATAACACACCag 17 28HD-NG-NG-NI-NN-NI-HD-NI- NG-NI-NI-HD-NI-HD-NI-HD-HD 102843ctAGACTAGCTTCAAAGTTgt 17 29 NI-NN-NI-HD-NG-NI-NN-HD-NG-NG-HD-NI-NI-NI-NN-NG- NG  8 102844 atAACACACCAGGGTCAATac 17 30NI-NI-HD-NI-HD-NI-HD-HD-NI- NN-NN-NN-NG-HD-NI-NI-NG 102845gtTAGCTTGCACTAGACTAgc 17 31 NG-NI-NN-HD-NG-NG-NN-HD-NI-HD-NG-NI-NN-NI-HD-NG- NI  9 102846 gtCAATACAACTTTGAAGCta 17 32HD-NI-NI-NG-NI-HD-NI-NI-HD- NG-NG-NG-NN-NI-NI-NN-HD 102847atAAAAGCAACTGTTAGCtt 17 33 NI-NI-NI-NI-NN-HD-NI-NI-HD-NG-NN-NG-NG-NI-NN-HD 10 102848 ttGAAGCTAGTCTAGTGCAag 17 34NN-NI-NI-NN-HD-NG-NI-NN- NG-HD-NG-NI-NN-NG-NN-HD- NI 102849ctGGAGCCTGTGATAAAAGca 17 35 NN-NN-NI-NN-HD-HD-NG-NN-NG-NN-NI-NG-NI-NI-NI-NI-NK 11 102850 ctAGTCTAGTGCAAGCTAac 17 36NI-NN-NG-HD-NG-NI-NN-NG- NN-HD-NI-NI-NN-HD-NG-NI 102851ctTCCTGGAGCCTGTGATAaa 17 37 NG-HD-HD-NG-NN-NN-NI-NN-HD-HD-NG-NN-NG-NN-NI-NG- NI 12 102852 gtGCAAGCTAACAGTTGCTtt 17 38NN-HD-NI-NI-NN-HD-NG-NI- NI-HD-NI-NN-NG-NG-NN-HD- NG 102853atCAGAGGCCAAACCCTTCct 17 39 HD-NI-NN-NI-NN-NN-HD-HD-NI-NI-NI-HD-HD-HD-NG-NG- HD 13 102854 ctAACAGTTGCTTTTATCAca 17 40NI-NI-HD-NI-NN-NG-NG-NN- HD-NG-NG-NG-NG-NI-NG-HD- NI 102855ctAATCAGAGGCCAAACCCtt 17 41 NI-NI-NG-HD-NI-NN-NI-NN-NN-HD-HD-NI-NI-NI-HD-HD- HD 14 102856 atCACAGGCTCCAGGAAGGgt 17 42HD-NI-HD-NI-NN-NN-HD-NG- HD-HD-NI-NN-NN-NI-NI-NN- NK 102857ctACCCCACCCACGCCCCCac 17 43 NI-HD-HD-HD-HD-NI-HD-HD-HD-NI-HD-NN-HD-HD-HD-HD- HD 15 102858 ctCCAGGAAGGGTTTGGCCtc 17 44HD-HD-NI-NN-NN-NI-NI-NN- NN-NN-NG-NG-NG-NN-NN- HD-HD 102859ctACCCCACCCACGCCCCCac 17 45 NI-HD-HD-HD-HD-NI-HD-HD-HD-NI-HD-NN-HD-HD-HD-HD- HD 16 102860 ttGGCCTCTGATTAGGGTGgg 17 46NN-NN-HD-HD-NG-HD-NG- NN-NI-NG-NG-NI-NN-NN-NN- NG-NK 102861ctGCCAGTCCTCTTCTACCcc 17 47 NN-HD-HD-NI-NN-NG-HD-HD-NG-HD-NG-NG-HD-NG-NI-HD- HD 17 102862 atTAGGGTGGGGGCGTGGGtg 17 48NG-NI-NN-NN-NN-NG-NN- NN-NN-NN-NN-HD-NN-NG- 102863 atGGAGAGGTCTGCCAGTCct17 49 NN-NN-NK NN-NN-NI-NN-NI-NN-NN-NG- HD-NG-NN-HD-HD-NI-NN-NG- HD 18102864 gtGGGGTAGAAGAGGACTGgc 17 50 NN-NN-NN-NN-NG-NI-NN-NI-NI-NN-NI-NN-NN-NI-HD-NG- NK 102865 ctGGGCAAACGGCCACCGAtg 17 51NN-NN-NN-HD-NI-NI-NI-HD- NN-NN-HD-HD-NI-HD-HD-NN- NI 19 102866ctGGCAGACCTCTCCATCGgt 17 52 NN-NN-HD-NI-NN-NI-HD-HD-NG-HD-NG-HD-HD-NI-NG-HD- NK 102867 ctTCCGAAAGAGGCCCCCCtg 17 53NG-HD-HD-NN-NI-NI-NI-NN- NI-NN-NN-HD-HD-HD-HD-HD- HD 20 102868atCGGTGGCCGTTTGCCCag 16 54 HD-NN-NN-NG-NN-NN-HD- HD-NN-NG-NG-NG-HD102869 atCACCAAGAGAGCCTTCCga 17 55 HD-NI-HD-HD-NI-NI-NN-NI-NN-NI-NN-HD-HD-NG-NG-HD- HD 21 102870 gtTTGCCCAGGGGGGCCTCtt 17 56NG-NG-NN-HD-HD-HD-NI-NN- NN-NN-NN-NN-NN-HD-HD- NG-HD 102871atTCTCCATCACCAAGAGAgc 17 57 NG-HD-NG-HD-HD-NI-NG-HD-NI-HD-HD-NI-NI-NN-NI-NN-NI 22 102872 ttGCCCAGGGGGGCCTCTTtc 17 58NN-HD-HD-HD-NI-NN-NN-NN- NN-NN-NN-HD-HD-NG-HD- NG-NG 102873atAAAATCCAATTCTCCATca 17 59 NI-NI-NI-NI-NG-HD-HD-NI-NI-NG-NG-HD-NG-HD-HD-NI-NG 23 102874 ctTTCGGAAGGCTCTCTTGgt 17 60NG-NG-HD-NN-NN-NI-NI-NN- NN-HD-NG-HD-NG-HD-NG- NG-NK 102875atTGAGAAATAAAATCCAAtt 17 61 NG-NN-NI-NN-NI-NI-NI-NG-NI-NI-NI-NI-NG-HD-HD-NI-NI 58R 102756 ctCTTGGTGATGGAGAATTgg 17 62HD-NG-NG-NN-NN-NG-NN-NI- NG-NN-NN-NI-NN-NI-NI-NG- NG 102757atTATTTCATTCCCATTGAga 17  7 NG-NI-NG-NG-NG-HD-NI-NG-NG-HD-HD-HD-NI-NG-NG-NN- NI 58M 102752 atAGGCCAGAAAAGAGATAtg 17  8NI-NN-NN-HD-HD-NI-NN-NI- NI-NI-NI-NN-NI-NN-NI-NG-NI 102753ctGGTGTGTTATGTCTAAGag 17  9 NN-NN-NG-NN-NG-NN-NG-NG-NI-NG-NN-NG-HD-NG-NI- NI-NK 58L 102750 ctAGTTTATAGGGGGTTCTac 17 10NI-NN-NG-NG-NG-NI-NG-NI- NN-NN-NN-NN-NN-NG-NG- HD-NG 102751atAGCACCCAAGGTCCATCag 17 11 NI-NN-HD-NI-HD-HD-HD-NI-NI-NN-NN-NG-HD-HD-NI-NG- HD

In this table, ‘Sample’ refers to the samples shown in FIG. 6. Theresults demonstrate that the TALEN pair 102853/102852 (indicated by thearrow in the figure) was able to increase relative gamma expression.Further, large deletions introduced by some pair sets of the TALENs(Sample 24: pair from Sample 22+pair from Sample 6; Sample 25: pair fromSample 16+pair from Sample 6; Sample 26: pair from Sample 22+pair fromSample 16) were also able to increase relative gamma expression. TheTALENs were engineered in this study to probe throughout the +58 region(see FIG. 8 depicting the enhancer sequence and the TALEN cleavagesites). All designs shown in Table 2 were active.

Example 4: ZFNs Targeted to the +58 Enhancer Region of BCL11A

In parallel to the TALEN pairs described in Example 3, ZFN pairs weremade to target the +58 region. The ZFNs used are shown below in Table 3.The nucleases are identified by their “SBS” number, a unique numericidentifier for each protein.

TABLE 3 ZFN pairs specific for +58 BCL11A enhancer region SBS # (targetsite, 5′- Design 3′) F1 F2 F3 F4 F5 F6 45796 RSDNLSE TRSPLRN RSDDLTRQKSNLSS QSAHRKN DSSHRTR atGGCTGAAA (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID AGCGATACAG NO: 81) NO: 82) NO: 83) NO: 84) NO: 85)NO: 86) ggctggct (SEQ ID NO: 63) 45795 DSSDRKK DRSNRTT TNSNRKR QSGDLTRLKDTLRR N/A tcACTACAGA (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDTAACTCCcaa NO: 87) NO: 88) NO: 89) NO: 90) NO: 91) gtcctgtc (SEQ IDNO: 64) 45802 GYCCLRD TSGNLTR QSGDLTR QRTHLKA QSGALAR QSANRTK caTAAGTAAG(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID AGCAGATAGC NO: 92)NO: 93) NO: 90) NO: 94) NO: 95) NO: 96) tgattcca (SEQ ID NO: 65) 45800DSSDRKK QNAHRKT QSGDLTR RSDHLSR QQWDRKQ N/A caCCTGGGGC (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID AtAGAGCCag NO: 87) NO: 97) NO: 90) NO: 98)NO: 99) ccctgtat (SEQ ID NO: 66) 45812 RSDYLSK TSSVRTT TNQNLTV TSGHLSRRSADLTR TNQNRIT tcCATACAGG (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID TAATaACATA NO: 100) NO: 101) NO: 102) NO: 103) NO: 104) NO: 105)Ggccagaa (SEQ ID NO: 67) 45813 QSGALAR RLDWLPM QSGDLTR HKWVLRQ N/A N/AacTTTGCACT (SEQ ID (SEQ ID (SEQ ID (SEQ ID GGAAtcagct NO: 95) NO: 106)NO: 90) NO: 107) atctgctc (SEQ ID NO: 68) 45816 GYCCLRD TSGNLTR QSGDLTRQRTHLKA QSGALAR N/A aaGTAAGAGC (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDAGATAGCtga NO: 92) NO: 93) NO: 90) NO: 94) NO: 95) ttccagtg (SEQ IDNO: 69) 45815 DSSDRKK QNAHRKT LKQNLDA RSAHLSR RSDVLST DTRNLRA caCACCTGGG(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID GCATAGAGCC NO: 87)NO: 97) NO: 108) NO: 109) NO: 110) NO: 111) agccctgt (SEQ ID NO: 70)45844 LRHHLTR RRDNLHS RSDHLSN DSRSRIN DRSHLTR QSGTRKT tcACAGGCTC (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID CAGGAAGGGT NO: 112) NO: 113)NO: 114) NO: 115) NO: 116) NO: 117) ttggcctc (SEQ ID NO: 71) 45843DQSNLRA RPYTLRL TGYNLTN TSGSLTR QHQVLVR QNATRTK aaGCAACTGT (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID TAGCTTGCAC NO: 118) NO: 119)NO: 120) NO: 121) NO: 122) NO: 123) tagactag (SEQ ID NO: 72) 45849TSGSLSR RSDHLTQ QSGHLAR QKGTLGE QSSDLSR RRDNLHS caCAGGCTCC (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID AGGAAGGGTT NO: 124) NO: 125)NO: 126) NO: 127) NO: 128) NO: 113) tggcctct (SEQ ID NO: 73) 45848DQSNLRA RPYTLRL TGYNLTN TSGSLTR DQSNLRA AQCCLFH aaAGCAACtG (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID TTAGCTTGCA NO: 118) NO: 119)NO: 120) NO: 121) NO: 118) NO: 129) Ctagacta (SEQ ID NO: 74) 45872QSGALAR RSDHLSR TSGHLSR RSDALAR DRSHLTR RSDHLSR gtGGGGGCGT (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID GGGTGGGGTA NO: 95) NO: 98)NO: 103) NO: 130) NO: 116) NO: 98) gaagagga (SEQ ID NO: 75) 45871QSNDLSN RSHHLKA RSDNLSE TSSNRKT N/A N/A ctAATCAGAG (SEQ ID (SEQ ID(SEQ ID (SEQ ID GCCAaaccct NO: 131) NO: 132) NO: 81) NO: 133) tcctggag(SEQ ID NO: 76) 45881 DRSHLTR RSDHLSR RSDNLSE ASKTRKN TSGSLSR QWKSRARtgGCCGTTtG (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID CCCAGGGGGGNO: 116) NO: 98) NO: 81) NO: 134) NO: 124) NO: 135) Cctctttc (SEQ IDNO: 77) 45880 DRSALSR QSGDLTR RSDVLSE TSGHLSR RSANLAR RSDALTQ cgATGGAGaG(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID GTCTGCCAGT NO: 136)NO: 90) NO: 137) NO: 103) NO: 138) NO: 139) Cctcttct (SEQ ID NO: 78)45889 DRSHLTR RSDHLSR RSDNLSE ASKTRKN N/A N/A ttGCCCAGGG (SEQ ID (SEQ ID(SEQ ID (SEQ ID GGGCctcttt NO: 116) NO: 98) NO: 81) NO: 134) cggaaggc(SEQ ID NO: 79) 45888 DRSALSR RSDNLTR QSGHLSR TSGNLTR DLTTLRK N/AccACCGATGG (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID AGAGGTCtgc NO: 136)NO: 140) NO: 141) NO: 93) NO: 142) cagtcctc (SEQ ID NO: 80) 48117DQSNLRA RPYTLRL SGYNLEN TSGSLTR DQSNLRA AQCCLFH aaAGCAACtG (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID TTAGCTTGCA NO: 118) NO: 119)NO: 253) NO: 121) NO: 118) NO: 129) Ctagacta (SEQ ID NO: 74) 48037DQSNLRA RPYTLRL SGYNLEN TSGSLTR DQSNLRA AQCCLFH aaAGCAACtG (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID TTAGCTTGCA NO: 118) NO: 119)NO: 253) NO: 121) NO: 118) NO: 129) Ctagacta (SEQ ID NO: 74)

These ZFNs were used to cleave the BCL11A +58 enhancer region in CD34cells and transfected cells were analyzed for relative gamma expressionafter erythrocyte differentiation. One ZFN pair, 45844/45843 wasidentified that caused an increase in relative gamma expression ascompared to cells treated with a GFP transduction control, or cellstreated with other ZFNs. FIG. 8 shows that the site targeted by this ZFNpair partially overlap that targeted by the most active TALEN pairdescribed in Example 3. Closer inspection of the sequence that iscleaved reveals that it contains a “GATA-1” consensus site (A/T GATAA/G), known to be one of the sequences bound by the GATA1 and relatedtranscription factors. See, e.g., Martin and Orkin (1990) Genes Dev4:1886-1898; Fujiwara et al. (2009) Molecular Cell 36:667-681; Tijssenet al. (2011) Developmental Cell 20:597-609; May et al. (2013) Cell StemCell 13:1-15. All designs shown in Table 3 were active.

A region comprising the cleavage site was amplified by PCR, andfollowing amplification, the PCR product was sequenced via MiSeq highthroughput sequencing analysis according to manufacturer's instructions(Ilumina) for both the ZFN 45843/45844 and TALEN 102852/102853 pairs.The 5 most common genotypes are shown below in Tables 5a and 5b, showcleavage at or near the nuclease target sites and reveal anuclease-mediated loss of the GATA-1 consensus sequence (in box) in bothinstances (Table 5a shows SEQ ID NOS 227-281 from top to bottom; Table5b shows SEQ ID NOS 282-286 from top to bottom). The TALEN pair cleavesslightly downstream of the consensus sequence, potentially resulting ina lesser incidence of the knock out of this sequence.

TABLE 5a MiSeq analysis of deletion region for ZFN 45843/45844 pairLength % Seq Vs over- Count Length Amplicon all Alignment 18669 123 NA 0 53.565

CTCCAGGAAGG  2187 108 shor- −15  6.275ACACACCAGGGTCAATACAACTTTGAAGCTAGTCTAGTGAAAGCTAACAG--------------- terGCTCAAGGAAGG  1415 110 shor- −13  4.060ACACACCAGGGTCAATACAACTTTGAAGCTAGTCTAGTGCAAGCTAACAGTTGCT---------- ter---CCAGGAAGG  1374 122 shor-  −1  3.942ACACACCAGGGTCAATACAACTTTGAAGCTAGTCTAGTGCAAGCTAACAGTTGCTTT-ATCACAG terGCTCCGGGAAGG  818 118 shor-  −5  2.347ACACACCAGGGTCAATACAACTTTGGAGCTAGTCTAGTGCAAGCTAACAGTTGCC-----CACAG terGCTCCAGGAAGG

TABLE 5b MiSeq analysis of deletion region for TALEN 102852/102853 pairLength % Seq Vs over- Count Length Amplicon all Alignment 19238 123 NA 0 68.511

CAGGAAGG  1265 115 shor-  −8  4.505ACACACCAGGGTCAATACAACTTTGAAGCTAGTCTAGTGCAAGCTACCAGTTGCTTTTATC--------ter CAGGAAGG  893 110 shor- −13  3.180ACACACCAGGGTCAATACAACTTTGAAGCTAGTCTAGTGCAAGCTACCAGTTGCT-------------Cter CAGGAAGG  524 121 shor-  −2  1.866ACACACCAGGGTCAATACAACTTTGAAGCTAGTCTAGTGCAAGCTAACAGTTGCTTTCATCA--GGCTCter CAGGAAGG  449 122 shor-  −1  1.599ACACACCAGGGTCAATACAACTTTGAAGCTAGTCTAGTGCAAGCTAACAGTTGCTTTTGTC-CAGGCTCter CAGGAAGG

Example 5: TALEN Walk Across the +55 DNAseI Hypersensitive Site in theBCL11A Enhancer Region

To further refine the area required for enhancer activity in the +55region, a series of TALEN pairs were made to create a series ofmutations across this stretch of DNA. The TALEN pairs made in thisexperiment are shown below in Table 4.

TABLE 4 TALEN pairs that recognize the +55 BCL11A enhancer region # ofSEQ ID Sample SBS# target 5′→3′ RVDs NO: N→C RVD Sequence  1 102876atAATGAATGTCCCAGGCCaa 17 143 NI-NI-NG-NN-NI-NI- NG-NN-NG-HD-HD-HD-NI-NN-NN-HD- HD 102877 ctGCCCCATACCCACTTCcc 16 144 NN-HD-HD-HD-HD-NI-NG-NI-HD-HD-HD- NI-HD-NG-NG-HD  2 102878 atTCTAGGAAGGGAAGTGGgt 17 145NG-HD-NG-NI-NN- NN-NI-NI-NN-NN- NN-NI-NI-NN-NG- NN-NK 102879gtACCAGGAAGGCAATGGGct 17 146 NI-HD-HD-NI-NN-NN- NI-NI-NN-NN-HD-NI-NI-NG-NN-NN-NK  3 102880 gtGGGTATGGGGCAGCCCAtt 17 147 NN-NN-NN-NG-NI-NG-NN-NN-NN-NN- HD-NI-NN-HD-HD- HD-NI 102881 atTGCATCATCCTGGTACca 16 148NG-NN-HD-NI-NG- HD-NI-NG-HD-HD- NG-NN-HD  4 102882 ctTCCTGGTACCAGGATGAtg17 149 NG-HD-HD-NG-NN- NN-NG-NI-HD-HD-NI- NN-NN-NI-NG-NN-NI 102883gtGGGGAGCTCACAGCCTCca 17 150 NN-NN-NN-NN-NI- NN-HD-NG-HD-NI-HD-NI-NN-HD-HD- NG-HD  5 102884 atGATGCAATGCTTGGAGGct 17 151NN-NI-NG-NN-HD- NI-NI-NG-NN-HD- NG-NG-NN-NN-NI- NN-NK 102885gtGTGCCCTGAGAAGGTGGgg 17 152 NN-NG-NN-HD-HD- HD-NG-NN-NI-NN-NI-NI-NN-NN-NG- NN-NK  6 102886 atGCTTGGAGGCTGTGAGCtc 17 153NN-HD-NG-NG-NN- NN-NI-NN-NN-HD- NG-NN-NG-NN-NI- NN-HD 102887atCACAGGGTGTGCCCTGAga 17 154 HD-NI-HD-NI-NN-NN- NN-NG-NN-NG-NN-HD-HD-HD-NG-NN- NI  7 102888 ctCCCCACCTTCTCAGGGCac 17 155HD-HD-HD-HD-NI- HD-HD-NG-NG-HD- NG-HD-NI-NN-NN- NN-HD 102889ctGGACAGAGGGGTCCCACaa 17 156 NN-NN-NI-HD-NI- NN-NI-NN-NN-NN-NN-NG-HD-HD-HD- NI-HD  8 102890 ctTCTCAGGGCACACCCTGtg 17 157NG-HD-NG-HD-NI- NN-NN-NN-HD-NI- HD-NI-HD-HD-HD- NG-NK 102891ctGGGCTGGACAGAGGGGTc 17 158 NN-NN-NN-HD-NG- c NN-NN-NI-HD-NI-NN-NI-NN-NN-NN- NN-NG  9 102892 ctGTGATCTTGTGGGACCcc 16 159NN-NG-NN-NI-NG- HD-NG-NG-NN-NG- NN-NN-NN-NI-HD- HD 102893atGCACACCCAGGCTGGGct 16 160 NN-HD-NI-HD-NI-HD- HD-HD-NI-NN-NN-HD-NG-NN-NN-NK 10 102894 ctTGTGGGACCCCTCTGTCca 17 161 NG-NN-NG-NN-NN-NN-NI-HD-HD-HD- HD-NG-HD-NG-NN- NG-HD 102895 gtGCCGACCAAGCACACAAga 17162 NN-HD-HD-NN-NI- HD-HD-NI-NI-NN-HD- NI-HD-NI-HD-NI-NI 11 102896ctGTCCAGCCCAGCCTGGGtg 17 163 NN-NG-HD-HD-NI- NN-HD-HD-HD-NI-NN-HD-HD-NG-NN- NN-NK 102897 atCAGTGCCGACCAAGCACac 17 164HD-NI-NN-NG-NN- HD-HD-NN-NI-HD- HD-NI-NI-NN-HD-NI- HD 12 102898ctGGGTGTGCATCTTGTGTgc 17 165 NN-NN-NN-NG-NN- NG-NN-HD-NI-NG-HD-NG-NG-NN-NG- NN-NG 102899 ctACCGCGACCCCTATCAGtg 17 166NI-HD-HD-NN-HD- NN-NI-HD-HD-HD- HD-NG-NI-NG-HD-NI- NK 13 102902gtAGGGAGTTGTCGGCACAca 17 167 NI-NN-NN-NN-NI- NN-NG-NG-NN-NG-HD-NN-NN-HD-NI- HD-NI 102903 ttGGGGACCGCTCACAGGAca 17 168NN-NN-NN-NN-NI- HD-HD-NN-HD-NG- HD-NI-HD-NI-NN-NN- NI 14 102904ctGCTGCATGTCCTGTGAgc 16 169 NN-HD-NG-NN-HD- NI-NG-NN-NG-HD-HD-NG-NN-NG-NN- NI 102905 ctGAAGGCTGGGCACAGCCtt 17 170 NN-NI-NI-NN-NN-HD-NG-NN-NN-NN- HD-NI-HD-NI-NN-HD- HD 15 102906 gtCCCCAAGGCTGTGCCCAgc 17171 HD-HD-HD-HD-NI-NI- NN-NN-HD-NG-NN- NG-NN-HD-HD-HD- NI 102907ctGTCAGAAGAGGCCCTGGac 17 172 NN-NG-HD-NI-NN- NI-NI-NN-NI-NN-NN-HD-HD-HD-NG-NN- NK 16 102912 ttCTGACAGGCCCTGCTGGtt 17 173HD-NG-NN-NI-HD-NI- NN-NN-HD-HD-HD- NG-NN-HD-NG-NN- NK 102913gtGGTGCGTGGAGATAATGcc 17 174 NN-NN-NG-NN-HD- NN-NG-NN-NN-NI-NN-NI-NG-NI-NI-NG- NK 17 102914 ctGCTGGTTATCACTGTTGgc 17 175NN-HD-NG-NN-NN- NG-NG-NI-NG-HD-NI- HD-NG-NN-NG-NG- NK 102915ctGGGCACAGAAGTGGTGCgt 17 176 NN-NN-NN-HD-NI- HD-NI-NN-NI-NI-NN-NG-NN-NN-NG-NN- HD 18 102916 ttGGCATTATCTCCACGCAcc 17 177NN-NN-HD-NI-NG- NG-NI-NG-HD-NG- HD-HD-NI-HD-NN- HD-NI 102917gtGACCCAGCAGCCCTGGGca 17 178 NN-NI-HD-HD-HD- NI-NN-HD-NI-NN-HD-HD-HD-NG-NN-NN- NK 19 102918 atCTCCACGCACCACTTCTgt 17 179HD-NG-HD-HD-NI- HD-NN-HD-NI-HD- HD-NI-HD-NG-NG- HD-NG 102919ctCCTTAAGGTGACCCAGCag 17 180 HD-HD-NG-NG-NI-NI- NN-NN-NG-NN-NI-HD-HD-HD-NI-NN-HD 20 102920 gtGCCCAGGGCTGCTGGGTca 17 181 NN-HD-HD-HD-NI-NN-NN-NN-HD-NG- NN-HD-NG-NN-NN- NN-NG 102921 ctATGTAGACGGGTGTGTGgc 17182 NI-NG-NN-NG-NI- NN-NI-HD-NN-NN- NN-NG-NN-NG-NN- NG-NK 21 102922gtCACCTTAAGGAGCCACAca 17 183 HD-NI-HD-HD-NG- NG-NI-NI-NN-NN-NI-NN-HD-HD-NI-HD-NI 102923 gtCAGACCCCAAGCAGGAAgg 17 184 HD-NI-NN-NI-HD-HD-HD-HD-NI-NI-NN-HD- NI-NN-NN-NI-NI

The TALENs were introduced into CD34+ cells as described above and thecells were induced to differentiate into erythroid cells as describedabove. Taqman® analysis was performed as described and several sitesidentified that caused an increase in relative gamma expression (FIG.9). The cleavage sites are displayed in FIG. 10. All designs shown inTable 4 were active. Interestingly, one of the TALEN pairs which drovean increase in gamma globin mRNAs cleaves at another GATA-1 consensussequence (cleavage site 17 on FIG. 10).

Example 6: ZFNs Directed to the +55 DNAse I Hypersensitive Site inBCL11A

Similar to Example 4, a set of ZFNs were made to probe the +55 DNAse Ihypersensitive region. The ZFNs are shown below in Table 6.

TABLE 6 +55 enhancer region specific ZFNs: designs and targets SBS #(target site, 5′- Design 3′) F1 F2 F3 F4 F5 F6 46156 TSSNRKT AACNRNAWKCQLPI DRSNLTR RSDHLSQ DSSTRKK (tgGCCTGGG (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID ACATTCATTA NO: 133) NO: 185) NO: 186) NO: 187)NO: 188) NO: 189) Tttagccac; SEQ ID NO: 232) 46158 QSGALAR RKYYLAKRSDNLSV RSAHLSR QSGNLAR ARWSLGK (tcTAGGAAG (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID GGAAGTGGGT NO: 95) NO: 190) NO: 191) NO: 109)NO: 192) NO: 193) Atggggcag; SEQ ID NO: 233) 46163 WKCQLPI DRSNLTRRSDHLSQ DSSTRKK RPYTLRL QSGNLAR (taGAATTGG (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID CCTGGGACAT NO: 186) NO: 187) NO: 188) NO: 189)NO: 119) NO: 192) Tcattattt; SEQ ID NO: 234) 46164 RSAHLSR RSDALTQTSGHLSR RSDALAR QSGNLAR RQEHRVA (gaAGGGAAG (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID TGGGTATGGG NO: 109) NO: 139) NO: 103) NO: 130)NO: 192) NO: 194) Gcagcccat; SEQ ID NO: 235 46180 DRSHLTR QSGNLARRSDSLSA DNSNRIK RSDVLSE SPSSRRT (tcATCCTGg (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID TACCAGGAAG NO: 116) NO: 192) NO: 195) NO: 196)NO: 137) NO: 197) GCaatgggc; SEQ ID NO: 236) 46181 RSDNLAR WQSSLIVDRSHLTR QSGHLSR QSSDLSR LKWNLRT (gcAATGCTt (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID GGAGGCTGTG NO: 198) NO: 199) NO: 116) NO: 141)NO: 128) NO: 200) AGctcccca; SEQ ID NO: 237) 46188 PCRYRLD RSANLTRRSDHLSR TSGHLSR QSGNLAR QKPWRTP (ccTGAGAAG (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID GTGGGGAGCT NO: 201) NO: 202) NO: 98) NO: 103)NO: 192) NO: 203) Cacagcctc; SEQ ID NO: 238) 46189 QSSHLTR RSDALARYRSSLKE TSGNLTR RSDTLSA DKSTRTK (acACCCTGt (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID (SEQ ID GATCTTGTGG NO: 204) NO: 130) NO: 205) NO: 93)NO: 206) NO: 207) GAcccctct; SEQ ID NO: 239) 46208 DRSALAR RSDHLSRQGAHLGA QSSHLTR QSSDLTR N/A (ggGCTGGAc (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID AGAGGGGTCc NO: 208) NO: 98) NO: 209) NO: 204) NO: 210)cacaagatc; SEQ ID NO: 240) 46209 RSDSLLR SASARWW TQSNLRM RNASRTR DRSHLTRRLDWLPM (gcCTGGGTG (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDTGCATCTTGT NO: 211) NO: 212) NO: 213) NO: 214) NO: 116) NO: 106)Gtgcttggt; SEQ ID NO: 241) 46216 RSDHLSR QGAHLGA QSSHLTR QSSDLTR RSDHLSQDSSHRTR (caGGCTGGG (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDCTGGAcAGAG NO: 98) NO: 209) NO: 204) NO: 210) NO: 188) NO: 86)GGgtcccac; SEQ ID NO: 242) 46217 RSDHLSQ RRSDLKR RSDSLLR SASARWW TQSNLRMRNASRTR (gtGTGCATC (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDTTGTGtGCTT NO: 188) NO: 215) NO: 211) NO: 212) NO: 213) NO: 214)GGtcggcac; SEQ ID NO: 243) 46226 RSDNLST DNSNRIN QSGDLTR QSGNLHV DRSDLSRDSSTRRR (gtGCCGACC (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDAAGCACACAA NO: 216) NO: 217) NO: 90) NO: 218) NO: 219) NO: 220)Gatgcacac; SEQ ID NO: 244) 46228 LKQNLDA RSAHLSR QSGALAR RSDDLTR LKQNLDARSHHLKA (atAGGGGTc (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDGCGGTAGGGA NO: 108) NO: 109) NO: 95) NO: 83) NO: 108) NO: 132)GTtgtcggc; SEQ ID NO: 245) 46229 QSGDLTR QSGNLHV DRSDLSR DSSTRRR RSDNLSETSSNRKT (ccTATCAGt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDGCCGACCAAG NO: 90) NO: 218) NO: 219) NO: 220) NO: 81) NO: 133)CAcacaaga; SEQ ID NO: 246) 46230 DRSHLSR DRSALAR TSGSLSR QAGHLAK QSGALARRSDDLTR (tcGCGGTAg (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDGGAGTTGTCG NO: 221) NO: 208) NO: 124) NO: 222) NO: 95) NO: 83)GCacacact; SEQ ID NO: 247) 46240 RSDSLSV QSGDLTR QSGDLTR TSHNRNA RSDHLSQDNSNRIN (ctCACAGGa (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDCATGCAGCAG NO: 223) NO: 90) NO: 90) NO: 224) NO: 188) NO: 217)TGtgtgccg; SEQ ID NO: 248) 46241 DRSNLSS RSHSLLR QSSDLSR RSDNLSV DNRDRIKN/A (tcCCCAAGG (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID CTGTGCCCagNO: 225) NO: 226) NO: 128) NO: 191) NO: 227) ccttcagtg; SEQ ID NO: 249)46246 ASKTRTN RNASRTR RSDNLSV YSSTRNS QSSDLSR RSDALAR (gtGTGGCTc (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID CTTAAGGTGA NO: 228) NO: 214)NO: 191) NO: 229) NO: 128) NO: 130) CCcagcagc; SEQ ID NO: 250) 46247RSDNLAR QSTPRNT WPDYLPT DRSALAR N/A N/A (ccGTCTACA (SEQ ID (SEQ ID(SEQ ID (SEQ ID TAGAGgccct NO: 198) NO: 230) NO: 231) NO: 208)tcctgcttg; SEQ ID NO: 251)

The +55 region specific ZFNs were tested for cleavage activity using theCel-I assay and found to be active in K562 cells. All designs shown inTable 6 were active and the ZFN-induced gene modification, described as% NHEJ (non-homologous end joining) found for each pair is listed belowin Table 7.

TABLE 7 Cleavage activity of BCL11A +55 enhancer region specific ZFN inK562 cells ZFN 1 ZFN 2 (SBS#) (SBS#) % NHEJ 46158 46156 30.36 4616446163 21.90 46181 46180 15.21 46189 46188 12.76 46209 46208 27.96 4621746216 16.63 46228 46226 37.79 46230 46229 37.23 46241 46240 24.97 4624746246 17.30 GFP Transduction control 0.00

The CD34+ cells were transfected with the ZFN pairs as described forExample 4, and then differentiated into erythrocytes as above. The ZFNsshown in 6 bound to and modified the BCL11A +55 enhancer region.

Example 7: Increasing Activity of +58 Specific ZFN Pairs

The ZFNs targeting the +58 region were further refined by shifting thetarget sequences, altering finger identity and using alternate linkersbetween the zinc finger DNA binding domain and the FokI cleavagedomains.

ZFN pairs were made to target a sequence very close to the cleavage siteof the 45843/45844 ZFN pair. The pairs are shown below in Table 8, andthe location of their binding sites are shown in FIG. 12A.

TABLE 8 ZFNs targeting the +58 enhancer region SBS # (target site, 5′-Design 3′) F1 F2 F3 F4 F5 F6 linker 46880 RSDHLTQ QSGHLAR QKGTLGEQSSDLSR RRDNLHS N/A L7a caCAGGCTC (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID CAGGAAGGg NO: 125) NO: 126) NO: 127) NO: 128) NO: 113) tttggcctct (SEQ ID NO: 73) 47923 RSDHLTQ QSGHLAR QKGTLGE QSSDLSR RRDNLHS N/A L0caCAGGCTC (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID CAGGAAGGg NO: 125)NO: 126) NO: 127) NO: 128) NO: 113) tttggcctc t (SEQ ID NO: 73) 50679RSDHLTQ QSGHLAR QKGTLGE QSSDLSR RRDNLHS N/A L0[-1] caCAGGCTC (SEQ ID(SEQ ID (SEQ ID (SEQ ID (SEQ ID CAGGAAGGg NO: 125) NO: 126) NO: 127)NO: 128) NO: 113) tttggcctc t (SEQ ID NO: 73) 50680 RSDHLTQ QSGHLARQKGTLGE QSSDLSR RRDNLHS N/A L0[-2] caCAGGCTC (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID CAGGAAGGg NO: 125) NO: 126) NO: 127) NO: 128) NO: 113)tttggcctc t (SEQ ID NO: 73) 46923 QKGTLGE QSGSLTR TGYNLTN TSGSLTRQHQVLVR QNATRTK L7e4 aaGCAACTG (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID TTAGCttGC NO: 127) NO: 252) NO: 120) NO: 121) NO: 122) NO: 123)ACTAgacta g (SEQ ID NO: 72) 46999 RSDHLTQ QSGHLAR QKGTLGE QSSDLSRRRDNLHS N/A L7c5 caCAGGCTC (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDCAGGAAGGg NO: 125) NO: 126) NO: 127) NO: 128) NO: 113) tttggcctct (SEQ ID NO: 73) 45844 LRHHLTR RRDNLHS RSDHLSN DSRSRIN DRSHLTR QSGTRKTL7a tcACAGGCT (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID CCAGGAAGGNO: 112) NO: 113) NO: 114) NO: 115) NO: 116) NO: 117) GTttggcct c (SEQID NO: 71) 47021 LRHHLTR RRDNLHS RSDHLSN DSRSRIN DRSHLTR QSGTRKT L7c5tcACAGGCT (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID CCAGGAAGGNO: 112) NO: 113) NO: 114) NO: 115) NO: 116) NO: 117) GTttggcct c (SEQID NO: 71) 45843 DQSNLRA RPYTLRL TGYNLTN TSGSLTR QHQVLVR QNATRTK L7aaaGCAACTG (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID TTAGCTTGCNO: 118) NO: 119) NO: 120) NO: 121) NO: 122) NO: 123) ACtagacta g (SEQID NO: 72) 46801 DQSNLRA RPYTLRL SGYNLEN TSGSLTR DQSNLRA AQCCLFH L7aaaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID GTTAGCTTGNO: 118) NO: 119) NO: 253) NO: 121) NO: 118) NO: 129) CACtagacta (SEQ ID NO: 74) 46786 DQSNLRA RPYTLRL SGYNLEN TSGSLTR DQSNLRA AQCCLFHL0 aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID GTTAGCTTGNO: 118) NO: 119) NO: 253) NO: 121) NO: 118) NO: 129) CACtagacta (SEQ ID NO: 74) 46934 DQSNLRA RPYTLRL SGYNLEN TSGSLTR DQSNLRA AQCCLFHL7c5 aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID GTTAGCTTGNO: 118) NO: 119) NO: 253) NO: 121) NO: 118) NO: 129) CACtagacta (SEQ ID NO: 74) 46816 DQSNLRA RPYTLRL SGYNLEN TSGSLTR DQSNLRA AQCCLFHL8c4 aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID GTTAGCTTGNO: 118) NO: 119) NO: 253) NO: 121) NO: 118) NO: 129) CACtagacta (SEQ ID NO: 74) 50670 DQSNLRA RPYTLRL SGYNLEN TSGSLTR DQSNLRA AQCCLFHL0[+9] aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ IDGTTAGCTTG NO: 118) NO: 119) NO: 253) NO: 121) NO: 118) NO: 129)CACtagact a (SEQ ID NO: 74) 50671 DQSNLRA RPYTLRL SGYNLEN TSGSLTRDQSNLRA AQCCLFH L0[+7] aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID GTTAGCTTG NO: 118) NO: 119) NO: 253) NO: 121) NO: 118) NO: 129)CACtagact a (SEQ ID NO: 74) 50672 DQSNLRA RPYTLRL SGYNLEN TSGSLTRDQSNLRA AQCCLFH L0[+5] aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID GTTAGCTTG NO: 118) NO: 119) NO: 253) NO: 121) NO: 118) NO: 129)CACtagact a (SEQ ID NO: 74) 48117 DQSNLRA RPYTLRL SGYNLEN TSGSLTRDQSNLRA AQCCLFH L7c5 aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID GTTAGCTTG NO: 118) NO: 119) NO: 253) NO: 121) NO: 118) NO: 129)CACtagact a (SEQ ID NO: 74) 50674 DQSNLRA RPYTLRL SGYNLEN TSGSLTRDQSNLRA AQCCLFH L0[+11] aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID GTTAGCTTG NO: 118) NO: 119) NO: 253) NO: 121) NO: 118)NO: 129) CACtagact a (SEQ ID NO: 74) 50676 DQSNLRA RPYTLRL SGYNLENTSGSLTR DQSNLRA AQCCLFH L0[+7] aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID GTTAGCTTG NO: 118) NO: 119) NO: 253) NO: 121) NO: 118)NO: 129) CACtagact a (SEQ ID NO: 74) 48037 DQSNLRA RPYTLRL SGYNLENTSGSLTR DQSNLRA AQCCLFH L7a aaAGCAACt (SEQ ID (SEQ ID (SEQ ID (SEQ ID(SEQ ID (SEQ ID GTTAGCTTG NO: 118) NO: 119) NO: 253) NO: 121) NO: 118)NO: 129) CACtagact a (SEQ ID NO: 74)

As can be seen in FIG. 12, the binding site of the new pairs is locatedone base pair closer to the center of the GATA-1 consensus sequence. Allpairs were active for binding and cleaving their targets in the genomeas gauged by an initial screen in K562 cells. mRNAs encoding the ZFNswere electroporated into CD34+ cells and then the cells weredifferentiated into the erythroid lineage as described in Example 2. Toanalyze relative gamma globin expression, the ratios of mRNAs encodinggamma globin and beta globin following ZFN treatment were determined byTaqman® analysis at 14 days following ZFN introduction. The results(FIG. 12B) demonstrated that the ZFN pairs targeting the shifted bindingsites had a greater influence on the expression of gamma globin.

Next, the proteins were made with an alternate linker types to test theeffect on the Bcl11a proteins. Similar sets of ZFNs were made thatcomprised the same helices in the same fingers but where each containeddifferent linkers between the ZFP DNA binding domain and the FokInuclease. For example, ZFNs 46801, 46786, 46816 and 46934 have the sameZFP DNA binding domain, but are linked to the nuclease domain using theL7a, L0, L8c4 and L7c5 linkers respectively. Similarly, ZFNs 45844 and47021 have the same DNA binding domain, but 45844 has the L7a linkerwhile 47021 uses the L7c5 linker. In addition, 46880, 47923, 50679 and50680 have the same DNA binding domains, but 46880 uses the L7a linker;47923 has the L7c5 linker; 50679 uses L0[−1] and 50680 has L0[−3]. Thelinkers are shown in FIG. 14 and FIG. 17.

As shown in FIG. 13, the ZFNs were tested in various pairs where one setof pairs targeted the binding site typified by the one targeted by the45843/45844 pair, and then a second set typified by the one targeted by46801/46880. The pairs tested are shown below in Table 9 as followsalong with the percent cleavage (NHEJ) as measured by sequence analysisat either Day 0 (DO) or Day 14 (D14). Each data point is a mean of threereplicates.

TABLE 9 Effect of Linker on ZFN cleavage activity HGB/ Target NHEJ,NHEJ, HBA Pair Linkers type D 0 D 14 ratio 45843/45844 L7a/L7a45843/45844 78% 73% 7.65 45843/47021  L7a/L7c5 45843/45844 82% 72% 7.5746801/46880 L7a/L7a 46801/46880 87% 64% 8.33 46801/47923 L7a/L0 46801/46880 82% 65% 12.07 46786/46880  L0/L7a 46801/46880 71% 40% 6.0646934/47923 L7c5/L0  46801/46880 80% 58% 83 GFP 2.71

Additional tests were designed to measure the number of indel containingedits that destroyed the GATA site in the target. Shown in Table 10below are combinations of ZFNs with various linkers and the effect thelinkers have in increasing the percent of indels overall and theincrease in indels that result in the loss of the GATA binding site.

TABLE 10 Exemplary linker activity % indel, no % of no Left ZFN RightZFN GATA % GATA/total SBS# Linker Type SBS# Linker Type lefttotal_indels indel 50670 L0 [+9] 47923 L0 28.0 30.8 0.91 50671 L0 [+7]24.5 26.7 0.92 50672 L0 [+5] 30.3 33.0 0.92 46801 L7a 27.6 30.4 0.9146816 L8c4 27.7 29.9 0.93 46934 L7c5 43.8 47.8 0.92 50670 L0 [+9] 50679L0 [−1] 30.1 32.6 0.92 50671 L0 [+7] 23.3 25.3 0.92 50672 L0 [+5] 31.134.2 0.91 46801 L7a 16.1 18.0 0.89 46816 L8c4 33.8 36.2 0.93 46934 L7c523.7 25.4 0.94 50670 L0 [+9] 50680 L0 [−2] 16.9 18.1 0.94 50671 L0 [+7]16.9 17.9 0.94 50672 L0 [+5] 23.2 24.9 0.93 46801 L7a 24.2 26.0 0.9346816 L8c4 21.3 23.2 0.92 46934 L7c5 25.1 27.0 0.93 46801 47923 16.519.1 0.86 GFP 1.4 0.8 1.64

As can be seen in the table above, refining of the original pair, 46801(L7a)/47923 (L0), whose activity was measured in this experiment to be19% overall indel formation, with 86% of those measured indels having adestroyed GATA binding site, can lead to an overall increased incleavage (indel) activity, and an overall increase in the percent ofindels that lead to destruction of GATA. See for example46934(L7c5)/47923 (L0) where 47.8% total indels were observed and 92% ofthose indels had a destroyed GATA site. These and other linkersdescribed (see FIG. 17) may be incorporated into the ZFNs to increaseand/or refine activity.

Thus, alternate linkers may be used with the ZFN pairs described hereinto cleave Bcl11a and increase in gamma hemoglobin relative to alphahemoglobin.

Example 8: In Vivo Administration

Compositions including cells (e.g., HSCs and/or RBC precursor cells),proteins (e.g., nucleases) and/or polynucleotides (e.g., encodingnucleases) as described herein are administered to a subject, forexample a subject with a hemoglobinopathy, essentially as described inU.S. Pat. Nos. 7,837,668; 8,092,429; U.S. Patent Publication No.20060239966; U.S. Pat. Nos. 6,180,613; 6,503,888 and/or U.S. Pat. Nos.6,998,118 and 7,101,540 to provide therapy for a subject in needthereof.

In addition, the cells are studied for use in large scale production ofedited LT-HSC. Bulk CD34+ cells are pre-stimulated with cytokinescomprising Stemspan™ CC110, Flt-3 ligand, SCF, and TPO and allcombinations thereof in concentrations from 10 ng/mL to 1000 ng/mL.Pre-stimulation may require exposure times of 24, up to 48 and up to 72hours. For clinical-scale HSPC transfection, any high capacity systemmay be used (e.g. Maxcyte GT Flow Transfection System).

For ex vivo therapies, edited cells (e.g., HSCs) are subjected to colonyforming assays in methylcellulose medium to confirm the frequency ofpluripotent cells and to verify that the colonies possess the desiredgenetic editing at the expected frequencies. The methylcellulose studiesare carried out using methods known in the art (see for example Kelleret al (1993) Mol Cell Bio 13(1):473).

To further ensure the engraftability of the BLC11a-edited cells (e.g.,HSC), the cells are engrafted into a relevant mouse model and/or anon-human primate model. Engraftment in these animals is done accordingto methods known in the art. See, for example Holt et al. (2010) NatBiotech 28, 839-847, Mo et al (2009) Retrovirology 6:65 and Peterson etal (2013) J. Med Primatol 42: 237. Engraftment with (1020 cGyirradiation) or without (200 cGy irradiation) myeloablativepreconditioning is used to investigate optimum engraftment and expansionconditions for stem cell transplantation.

Example 9: In Vivo Administration and Engraftment

As described above, CD34+ human cells were treated with mRNAs encodingthe +55 enhancer specific ZFNs and then engrafted into NSG mice. CD34+cells were obtained from healthy human volunteers. In some cases, CD34+mobilization strategies were done, using either G-CSF (Neupogen®) orG-CSF+Plerixafor (Mozobil®) prior to apheresis. The G-CSF wasadministered daily for the four days prior to apheresis according tomanufacturer's instructions, and if Plerixafor was used, it wasadministered on the final evening prior to harvest, again according tomanufacturer's instructions. The apheresis was performed by standardmethods. CD34+ cells were enriched from the mobilized PBMC leukopaksusing a Miltenyi CliniMACs system by standard methods and according tomanufacturer's instructions.

Capped and poly-adenylated mRNAs encoding the ZFNs were synthesizedusing Ambion mMessage mMachine® T7 ultra kit as instructed by themanufacturer and then electroporated into the CD34+ cells using either aMaxcyte GT system or a BTX ECM830 electroporator, both according tomanufacturer's instructions.

NOD.Cg-Prkdc^(scid) Il2rg^(tw1Wjl)/SzJ mice were used to receive theCD34+ transplant. One day (16-24 hours) prior to implantation, the micewere subject to sublethal irradiation (300 RAD). The ZFN-treated CD34+cells from above were transplanted into the irradiated mice through atail vein injection, where 1 million cells in 0.5 mL PBS-0.1% BSA weregiven per mouse.

For this experiment, CD34+ cells were electroporated with mRNAs encodingeither the 45843/45844 pair (electroporation #1) or the 46801/46880 pair(electroporation #2). In both cases, GFP was used as a control.Following transplantion into the mice, samples were taken at either 4 or16 weeks post-transplant to observe the level of marking in cells. Atweek 4, up to approximately 4% of the cells in the peripheral blood werehuman cells from both electroporations (see FIG. 15A). Genome editing(indels) in these cells was about 30-40% (FIG. 15B).

At week 16 post transplantation, the level of editing was measured inhuman pan-myeloid, B-cells, erythroid and stem cells in the mice. Inthese experiments, the cells from both electroporations were pooled. Thedata (FIG. 16) indicated that 40-50% gene editing was detected in allhuman cell populations analyzed. This experiment demonstrates thattransplanted CD34+ cells are maintained and differentiate whilemaintaining the gene editing at the BCL11A locus.

All patents, patent applications and publications mentioned herein arehereby incorporated by reference in their entirety.

Although disclosure has been provided in some detail by way ofillustration and example for the purposes of clarity of understanding,it will be apparent to those skilled in the art that various changes andmodifications can be practiced without departing from the spirit orscope of the disclosure. Accordingly, the foregoing descriptions andexamples should not be construed as limiting.

1-19. (canceled)
 20. A DNA-binding protein comprising a TALE-effectorprotein (TALE), wherein the TALE protein comprises a plurality of TALErepeat units, each repeat unit comprising a hypervariable diresidueregion (RVD), wherein the RVDs of the TALE repeats units are shown in asingle row of Table 1, Table 2 or Table
 4. 21. A fusion proteincomprising a TALE protein of claim 20 and a wild-type or engineeredcleavage domain or cleavage half-domain.
 22. A polynucleotide encodingone or more proteins of claim
 20. 23. An isolated cell comprising one ormore proteins according to claim
 20. 24. An isolated cell comprising oneor more polynucleotides according to claim
 22. 25. The cell of claim 23,wherein the cell is a hematopoietic stem cell.
 26. A kit comprising aprotein according to claim
 20. 27. A method of altering globin geneexpression in a cell, the method comprising: introducing, into the cell,one or more polynucleotides according to claim 22, under conditions suchthat the one or more proteins are expressed and expression of the globingene is altered.
 28. The method of claim 27, wherein expression of theglobin gene is increased.
 29. The method of claim 28, wherein the globingene is a gamma globin or beta globin gene.
 30. The method of claim 27,further comprising integrating a donor sequence into the genome of thecell.
 31. The method of claim 30, wherein the donor sequence isintroduced to the cell using a viral vector, as an oligonucleotide or ona plasmid.
 32. The method of claim 27, wherein the cell is selected fromthe group consisting of a red blood cell (RBC) precursor cell and ahematopoietic stem cell.
 33. The method of claim 30, wherein the donorsequence comprises a transgene under the control of an endogenous orexogenous promoter. 34-39. (canceled)
 40. A method of treating a patientin need of an increase in globin gene expression, the method comprisingadministering to the patient the pharmaceutical preparation of claim 19in an amount sufficient to increase the globin gene expression in thepatient.
 41. The method of claim 40, wherein the patient is known tohave, is suspected of having, or is at risk of developing aglobinopathy.
 42. The method of claim 41, wherein the globinopathy is athalassemia or sickle cell disease.
 43. The method of claim 42, whereinthe thalassemia is β-thalassemia.
 44. A pharmaceutical compositioncomprising the cell of claim
 23. 45. The cell of claim 25, wherein thehematopoietic stem cell is a CD34+ cell.